Collapse representative sequences to taxa

Hello everyone!

I would like to make core metrics with qiime using taxa, not features. For unifrac metrics I need the phylogenetic tree, but I only have a feature-tree, hence I need a tree built on sequences collapsed to one taxonomic level. I didn't find any way to obtain collapsed representative sequences on qiime2. Is there anybody that can give some hint?
Thank you!

claudia

Hi @claudia.vannini,

I am not aware of any methods in Qiime2 aimed to collapse representative sequences.
Any collapsing at sequence level is done at feature identification level (either dada2, deblut or vsearch).
Hence, if they are separate representativr sequences, it mean that were not possible to collapse them with the settings in use for the feature table construction.

I would suggest, for the analysis with the collapsed sequences, to use the non-phylogenetics metrics in the diversity analysis, to have a feeling if the result would match your expectation.

However, let see if someone else here has a possible suggestion.

Luca

2 Likes

Hi Luca,
thank you very much for your reply! Yes, I already performed non-phylogenetics metrics, I also have the suspect that it is not possible to collapse feature sequences at taxa levels in qiime2.
Thank you again, bye

claudia

1 Like

Hello Claudia,

To build off of what Luca said, I want to explain why there's no option to collapse representative sequences by taxonomy in Qiime2. This feature is missing for a reason!

Here is a toy example with three ASVs:

feature taxonomy seq total count
asv1 ...D_4__WD2101 soil group; D_5__??; ACCGACT 250
asv2 ...D_4__WD2101 soil group; D_5__??; ACCGTTT 21
asv3 ...D_4__WD2101 soil group; D_5__??; ACCGAGT 4

These all have the same taxonomy, so they would be collapsed.

But what would be the sequence of the new collapsed feature?

feature taxonomy seq total count
asv.new ...D_4__WD2101 soil group; D_5__??; ??????? 275

Would you choose the sequence from the most abundant feature (asv1)? That just ignores the other sequences...

Would you make a new 'average' sequence? That new sequence is never observed in your real data set...

What is the right way to collapse a set of sequences? :thinking:

2 Likes

Dear Colin,

thank you very much for your kind and exhaustive explanation! It will be very useful to reply to an article reviewer who is insistently asking for all the analyses (previously done with ASVs) to be repeated at all (!!) different taxonomic levels.
I completely agree with you, of course, there isn't actually a truly correct way to collapse sequences (at least with some kind of biological sense).
Thank you again, bye
claudia

2 Likes