Understanding taxa collapse

Hi everyone,

I came through this post: Is it possible to get Family and Species level diversity analysis plots? and what @ebolyen said got me wondering: "When you collapsed your table by taxonomy, all of the ASVs have been removed and grouped by taxonomy. ".

I’d like some help to understand. Does it mean that if I have two different Wolbachia strains (ASVs) on my data they will be collapsed together at level 7, summing up their frequencies? Or that will happen only if they have the exact same name, and not something differentiating them at the end of their taxonomy assignment? Am I getting taxa collapse completely wrong?

Thanks again for the patience,
Felipe.

1 Like

Correct. This is collapsing based on taxonomy names, so will only collapse these ASVs if the names match exactly.

E.g., these will collapse (taking names from SILVA database so apologize if lineage is not correct):
D_0__Bacteria;D_1__Proteobacteria;D_2__Alphaproteobacteria;D_3__Rickettsiales;D_4__Anaplasmataceae;D_5__Wolbachia;D_6__Wolbachia pipientis
D_0__Bacteria;D_1__Proteobacteria;D_2__Alphaproteobacteria;D_3__Rickettsiales;D_4__Anaplasmataceae;D_5__Wolbachia;D_6__Wolbachia pipientis

These will not:
D_0__Bacteria;D_1__Proteobacteria;D_2__Alphaproteobacteria;D_3__Rickettsiales;D_4__Anaplasmataceae;D_5__Wolbachia;D_6__Wolbachia pipientis
D_0__Bacteria;D_1__Proteobacteria;D_2__Alphaproteobacteria;D_3__Rickettsiales;D_4__Anaplasmataceae;D_5__Wolbachia;D_6__Wolbachia sp.

Nor will these:
D_0__Bacteria;D_1__Proteobacteria;D_2__Alphaproteobacteria;D_3__Rickettsiales;D_4__Anaplasmataceae;D_5__Wolbachia;D_6__Wolbachia pipientis
D_0__Bacteria;D_1__Proteobacteria;D_2__Alphaproteobacteria;D_3__Rickettsiales;D_4__SomethingsGoneWrongWithMyTaxonomy;D_5__Wolbachia;D_6__Wolbachia pipientis

Nor will these:
D_0__Bacteria;D_1__Proteobacteria;D_2__Alphaproteobacteria;D_3__Rickettsiales;D_4__Anaplasmataceae;D_5__Wolbachia;D_6__Wolbachia pipientis
D_0__Bacteria;D_1__Proteobacteria;D_2__Alphaproteobacteria;D_3__Rickettsiales;D_4__Anaplasmataceae;D_5__Wolbachia;D_6__Wolbachia pipientis Strain X

2 Likes

Thanks, @Nicholas_Bokulich, that’s what I had in mind, but I’d rather ask.

Now imagine I have feature hnuh23yg4b23i2bh121hb323 which is: D_0__Bacteria;D_1__Proteobacteria;D_2__Alphaproteobacteria;D_3__Rickettsiales;D_4__Anaplasmataceae;D_5__Wolbachia;D_6__Wolbachia pipientis

And feature i2ju2h928327723hjrj339dj which is also:
D_0__Bacteria;D_1__Proteobacteria;D_2__Alphaproteobacteria;D_3__Rickettsiales;D_4__Anaplasmataceae;D_5__Wolbachia;D_6__Wolbachia pipientis

Why wouldn’t they be together before taxa collapse? Is it something with the scripts that generate representative sequences? Thanks for the help!

1 Like

nothing is wrong and this is totally normal. the reason is that their sequences are slightly different (maybe as little as 1 nt!) but Wolbachia pipientis is still the top match for both.

This is the power of denoising: it resolves ASVs with single-nucleotide differences. OTU clustering would probably cause these ASVs to cluster (if they really are 1 nt apart), losing what could potentially be a significant signal present in your samples (e.g., strains that differentiate two different treatments or host species, let's say). That said, OTU clustering would still result in many OTUs with the same taxonomic assignment, for the same reason: the representative sequences are not the same but the taxonomic classification is the same.

Similarly, if you look at representative sequences you will see many sequences belonging to the same species but with different sequences... different strains of the same species can have slight differences in their 16S rRNA gene sequences.

The same exact cell can also have multiple copies with slight variations... so while these two features could represent two distinct sub-species, they could also just represent two different 16S copies found in the same cells.

Either way, it is nothing to worry about.

7 Likes

Thank you so much for the explanation and the patience, Nicholas!
I’m really getting along with QIIME2, and this forum makes it way easier because of you and the others :grin:

4 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.