Using silva-138 pre-trained classifier

Hi @Mudit_Bhatia,

Let's see if we can figure this out. First let me clarify a few things:

  1. Even though both tools use SILVA, this does not necessarily mean the databases are the same. For example, the pre-made classifiers from QIIME 2 might be the SILVA 138 and not the latest 138.1 version, of which there were some changes.

  2. QIIME 2 & mothur also curate the SILVA database differently, which means some reference sequences may have been discarded or renamed between the two. For example, pre-made classifiers from QIIME 2 generally follow this approach. There might have been some aggressive culling of the reference data, perhaps removing too many eukaryotes, but much of this was to just provide an example of what a user can do to curate their database. But how a database is curated can have large effects on how well the classifiers work. I'd suggest running through the linked tutorial to make your own SILVA database by simply running these commands :
    a) qiime rescript get-silva-data ...
    b) qiime rescript reverse-transcribe ...
    c) qiime rescript dereplicate ...
    d) qiime feature-classifier fit-classifier-naive-bayes ...
    This will give you an "unedited" version of SILVA 138.1 by which to compare, as none of the data quality or sequence removal steps would have been run. Give this a try and see what happens.

  3. AFAIK, each tool uses a different classifier, mothur uses KNN, and QIIME 2 uses naive bayes. Though I am not sure how much of a difference this will make.

Keep us posted as to the outcome. :slight_smile:

4 Likes