Using taxa collapse for SILVA 18S - different subranks

HI QIIME Community,

I had a general question about using the taxa collapse function with SILVA taxonomy strings. In all the examples I saw using taxa collapse with greengeen taxonomies this seemed straightforward specifying which taxonomy level to collapse at. For example if I want to collapse at the genus level you just specify –p-level 6. However when looking at the taxonomy strings with the SILVA database, and in particular the 18S taxonomies due to the increased number of subranks that don’t necessarily follow the phylum, class, order, family, genus, species order, level 6 in one taxonomy string may be the genus for one feature but a different rank in another ( genus may be rank 7 for example). Is there any way to ensure that you are collapsing by genera in this case and not a level of mixed ranks? Or is this something you have to manually go in and check that the genus names are all listed at the same level?

Additionally I was hoping for further clarification on what happens to features with taxonomies that do not reach the specified level? are these just collapsed down to the lowest similar level available? ( wanted to ensure they are not removed)

Any help would be much appreciated

Hi @jjankowiak,
The QIIME-compatible SILVA releases should contain taxonomy files with 7 levels for all sequences. Make sure to use those taxonomy files for training your classifier! Of course (as you've found), the unevenly leveled taxonomy does work for classification, but creates a headache when the time comes for collapsing.

The pre-trained classifiers that we release use the 7-level taxonomy (though note that the full-length SILVA SSU contains both 18S + 16S data so I suspect accuracy could be improved by training your own 18S-only classifier)

Correct, these are not removed, they are collapsed to the lowest similar level available. E.g., imagine you have ASVs with these taxonomy classifications:

k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Anaerostipes
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Coprococcus;s__comes

Collapsing at level 7 will yield the following feature IDs:

k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Anaerostipes;__
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;__;__;__
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Coprococcus;s__comes

Collapsing at level 4 will yield the following feature IDs:

k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales

I hope that helps!

3 Likes

Hi Nicholas,

Thank you this is very helpful. So as long as I make sure to use taxonomy files from the SILVA_132_QIIME_release folder in the SILVA database download and then further the taxonomy_7_levels.txt (consensus or majority /16S or 18S based on study) within the taxonomy folder I can be confident that the taxonomies at each level are the appropriate rank?

Yes, I am fairly confident. I did not make those files so do not want to put my word on the line, but I very much trust the people who did make those files and that is their intended use so it has been my assumption all these years :smile:

But please let me know if your experience varies... I have not worked with 18S so cannot vouch for the condition of the taxonomies!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.