SILVA 138 Classifiers

Hi @Anahita_Bharadwaj,

Great question! I apologize for the lack of documentation. I should remove the older version, and will likely do so soon. But for the most part I had truncated the species labels. The reason for this, is, you may have (nearly) identical sequences that point to very slightly different species label annotations, such as:

s__Clostridioides_difficile
s__Clostridioides_difficile_R20291

So, if your sequence is similar to these, you'd think it should be classified as s__Clostridioides_difficile. This will not be the case, as the specific species strings are different. What the classifier may actually return is the upper-level taxonomy g__Clostridioides.

This is not the fault of the classifier per se, but a problem of annotation which negatively affects the classifier. Because of this, I decided to only return the first two words (i.e. Clostridioides and difficile) of the "species" string. This should not affect the "no-species" labeled versions. In fact, I think the file sizes stay the same for those (there are no species labels to begin with). The files with updated species labels, should be slightly smaller in version 0.02 as the species labels are shorter. Some of the "species" labels are very long.

I also reworked some of the steps and code :computer:, so that you can run more of the steps within the QIIME 2 environment (prior code had you jumping back and forth between QIIME 1 and QIIME 2). In general, use the latest version, or follow the steps outlined in the pipeline of my original post. :slight_smile:

I hope this clarifies things! :octopus:

2 Likes