Classifier and taxonomy

SoilRotifer · February 3, 2021, 2:44am

You do not want to do this. That old numbered rank system is inconsistent. See here for more details:

You can use our RESCRIPt tool to do something quite similar:

I assume the first taxonomy string is the result of taxonomy classification? If so, there is no easy way to know that classification is indeed equivalent to the second taxonomy string you listed. Well, without looking through the reference sequences anyway.

This issue is that the classifier likely could not disambiguate between the sequence with the taxonomy full taxonomy string d__Bacteria; p__Firmicutes; c__Clostridia; o__Peptococcales; f__Peptococcaceae; g__uncultured; s__uncultured_bacterium, versus several other reference sequences with nearly identical taxonomy (likely different genus and species strings. So, the classifier will return the lowest common ancestor (usually). In this case, it could not determine anything past the family level.

Nope, not unless you filter your sequence data or your reference database in that way. If you see any text after any of the prefixes, e.g. g__uncultured, then that is the taxonomy pulled directly from the reference database itself. RESCRIPt allows you the option of appending the organism name as the species label. However, be careful of trusting this. See the Species-labels: caveat emptor! section of the RESCRIPt tutorial that I linked above for more details.

Hope this helps!