50% unassigned sequences after taxonomy

Nicholas_Bokulich · August 6, 2020, 3:52pm

From your description, it sounds like you are aware that the mixed orientation of your reads (mix of forward and reverse sequences) is causing the 50% unassigned rate. This is because the classify-sklearn method can only handle reads in a single orientation. They can be forward or reverse relative to the reference, but all query sequences must be oriented.

If your sequences are in mixed orientations, you can do one of two things:

use the classify-consensus-vsearch method to classify your sequences, as this can classify reads in either orientation.
RESCRIPt has an orient-seqs method (see tutorial below) that will orient sequences relative to a reference database. This will allow you to orient your query sequences by comparing them to a reference database (which should presumably be all in forward direction), and then use classify-sklearn to classify those sequences.

Let us know which method you use and what you find. Good luck!