Taxonomy names are different

Continuing the discussion from Classifier Training Questions:

So based on this post, taxonomy file to be used is 99_otu_taxonomy.txt. The taxonomy files based on what I downloaded are named differently. I am not sure which one should I use.

Good afternoon,

Great question!

The original database contains many reads, each with a taxonomy assignment. But the 99% clustered database is not the original; it’s clustered at 99% sequence similarity!

When database reads are clustered, members of the same cluster might not all have the same taxonomy name. Here’s an example from the the silva_v128 notes:

For example, if a cluster had two reads, and one taxonomy string was:
D_0__Archaea;D_1__Euryarchaeota;D_2__Methanobacteria;D_3__Methanobacteriales;D_4__Methanobacteriaceae;D_5__Methanobrevibacter;D_6__Methanobrevibacter sp. HW3
and the second taxonomy string was:
D_0__Archaea;D_1__Euryarchaeota;D_2__Methanobacteria;D_3__Methanobacteriales;D_4__Methanobacteriaceae;D_5__Methanobrevibacter;D_6__Methanobrevibacter smithii

Then for either consensus or majority strings, the level 7 (0 is the first level, the domain)
data would become ambiguous, as the species levels do not match. The above string for the 
representative sequence taxonomy mapping file becomes:
D_0__Archaea;D_1__Euryarchaeota;D_2__Methanobacteria;D_3__Methanobacteriales;D_4__Methanobacteriaceae;D_5__Methanobrevibacter;Ambiguous_taxa

So when members of a cluster in the 99% database have disagreements at a single level, you can choose to use the consensus taxonomy or the majority taxonomy for the new 99% cluster.

I hope that helps! Let me know if you have any other questions!

Colin

P.S. Some database uses more than seven levels, so you can choose to use all levels or a standardized 7 levels if you want. I like 7 levels as that’s the most familiar: Kingdom, Phylum, Class, Order, Family, Genus, Species

3 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.