Sklearn-classifier bug for archaea 16s amplicons

Nicholas_Bokulich · January 30, 2020, 11:24pm

Hi @mol,
Just a couple things to add to @SoilRotifer's advice.

The sklearn classifier should not give different classifications each time you run it... unless if your sequences are in mixed orientations, which confuses the classifier as described here:

So that explains why removing one sequence causes the results to change, and also probably why a few of these classify as Archaea and the others as Bacteria.

Even though you are using extract-reads, you are probably hitting a few bacterial sequences. classify-sklearn would not give bacterial classifications unless if this annotation is present in the (trimmed) reference database that you are using.

So in addition to @SoilRotifer's advice about starting with the full-length database, I'd advise trying to orient your sequences in the same direction. Unfortunately QIIME 2 can't do that for you right now. Another option is to use classify-consensus-vsearch, which is able to handle mixed-orientation sequences.