Welcome @Lu_Wang!
From your description, it sounds like you are aware that the mixed orientation of your reads (mix of forward and reverse sequences) is causing the 50% unassigned rate. This is because the classify-sklearn
method can only handle reads in a single orientation. They can be forward or reverse relative to the reference, but all query sequences must be oriented.
If your sequences are in mixed orientations, you can do one of two things:
- use the
classify-consensus-vsearch
method to classify your sequences, as this can classify reads in either orientation. - RESCRIPt has an
orient-seqs
method (see tutorial below) that will orient sequences relative to a reference database. This will allow you to orient your query sequences by comparing them to a reference database (which should presumably be all in forward direction), and then useclassify-sklearn
to classify those sequences.
Let us know which method you use and what you find. Good luck!