Continuing the discussion from Classifier Training Questions:
So based on this post, taxonomy file to be used is 99_otu_taxonomy.txt. The taxonomy files based on what I downloaded are named differently. I am not sure which one should I use.
Continuing the discussion from Classifier Training Questions:
So based on this post, taxonomy file to be used is 99_otu_taxonomy.txt. The taxonomy files based on what I downloaded are named differently. I am not sure which one should I use.
Good afternoon,
Great question!
The original database contains many reads, each with a taxonomy assignment. But the 99% clustered database is not the original; it’s clustered at 99% sequence similarity!
When database reads are clustered, members of the same cluster might not all have the same taxonomy name. Here’s an example from the the silva_v128 notes:
For example, if a cluster had two reads, and one taxonomy string was:
D_0__Archaea;D_1__Euryarchaeota;D_2__Methanobacteria;D_3__Methanobacteriales;D_4__Methanobacteriaceae;D_5__Methanobrevibacter;D_6__Methanobrevibacter sp. HW3
and the second taxonomy string was:
D_0__Archaea;D_1__Euryarchaeota;D_2__Methanobacteria;D_3__Methanobacteriales;D_4__Methanobacteriaceae;D_5__Methanobrevibacter;D_6__Methanobrevibacter smithii
Then for either consensus or majority strings, the level 7 (0 is the first level, the domain)
data would become ambiguous, as the species levels do not match. The above string for the
representative sequence taxonomy mapping file becomes:
D_0__Archaea;D_1__Euryarchaeota;D_2__Methanobacteria;D_3__Methanobacteriales;D_4__Methanobacteriaceae;D_5__Methanobrevibacter;Ambiguous_taxa
So when members of a cluster in the 99% database have disagreements at a single level, you can choose to use the consensus
taxonomy or the majority
taxonomy for the new 99% cluster.
I hope that helps! Let me know if you have any other questions!
Colin
P.S. Some database uses more than seven levels, so you can choose to use all levels
or a standardized 7 levels
if you want. I like 7 levels as that’s the most familiar: Kingdom, Phylum, Class, Order, Family, Genus, Species
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.