Using a second classifier to identify unknown species

statjn10 · November 15, 2021, 4:12pm

Hello! I have a question about classifying unknown reads. I am using data from amplicon sequencing that covered all V1-V9 regions. I used the naive bayes trained classifier in QIIME2 with the Silva 138 database and about 20% of my reads were lumped in the "unknown" category. My question is this: Is it OK to take those unknown reads and try to reclassify them with a different classifier and then merge the data together with that classified by using the NB classifier? I tried this with Blast + and the Silva 138 database and was able to reclassify most of those unknown reads with species that now have a confidence > 0.8.

Thank you!

SoilRotifer · November 15, 2021, 5:45pm

This usually indicates that your reads may be in the incorrect orientation. For more details, see the following threads:

I would also suggest manually running BLAST on several of these "unknown" sequences. These could simply be off-target sequences (i.e. non-SSU gene), rather than be mis-oriented.

-Mike