Notice that you used taxonomy_all to train classifier, which is combined 16S and 18S taxonomy. I'm not sure the order of 7 levels is corresponding to domain, phylum, class, order, family, genus, and species for 18S in Silva. As what i can know is that the 7 level taxonomy uses 7 levels if they are present. If more than 7 levels are present, the first 4 and last 3 levels of taxonomy are used. So does that means the last present 3 levels in Silva are corresponding to family, genus, and species, which seems unreasonable.
The 7 level taxonomy does indeed use the top 4 and last 3 levels of taxonomy. For bacteria and archaea, this does correspond to domain through species, as these consistently have 7 levels of taxonomy. Eukaryotes are far more variable, and indeed do not represent exactly domain/phylum/class/order/family/genus/species, except in the cases where data as presented in SILVA came exactly as domain/phylum/class/order/family/genus/species.
As an alternative, use the “all” taxonomy levels reference file-this will not tell you for example whether each level is a family level or not, but will include every level of taxonomy that came from SILVA.
Thanks, @William, that’s really helpful!
There still is a question trouble me a lot. I want to know if it is possible somehow for Eukaryotes of Silva database to create GreenGene-like 7 levels of taxonomy, which, of course, need us to known exactly where those domain, phylum, class, order, family, genus, and species are in Silva’s taxonomy. Or if there is alternative database except Silva allow me to to create GreenGene-like taxonomy for 18S rRNA sequences. Or maybe there is no need for us to know whether each level is a genus level or not for Eukaryotes? Anything would be grateful:smiley:
where you can mouse over the lineage and get unranked levels and superkingdoms and a subphylum but no phylum. It would take shoehorning (plus a lot of manual curation) to try and make it better fit the 7 levels in any case.
It may not be that important though-the data could be analyzed with the imperfect 7 ranks of taxonomy, and then when/if you find significant SVs/OTUs, you could look up the exact taxonomy from NCBI and report the full taxonomy there, if necessary.