This usually indicates that your reads may be in the incorrect orientation. For more details, see the following threads:
Welcome @Lu_Wang !
From your description, it sounds like you are aware that the mixed orientation of your reads (mix of forward and reverse sequences) is causing the 50% unassigned rate. This is because the classify-sklearn method can only handle reads in a single orientation. They can be forward or reverse relative to the reference, but all query sequences must be oriented.
If your sequences are in mixed orientations, you can do one of two things:
use the classify-consensus-vsearch method …
Hi @jose_gacia ,
What command did you run when trying to classify your reads with SILVA? When we observe many poor taxonomy assignments like this, it is typically due to mixed read orientation. I'd recommend using vsearch as outlined here:
https://forum.qiime2.org/t/lots-of-unassigned-reads/12061/3
Or installing the RESCRIPt plugin , and running rescript orient-seqs on your reads. Then retry the BLAST and/or naíve-bayes classifiers.
-Mike
I would also suggest manually running BLAST on several of these "unknown" sequences. These could simply be off-target sequences (i.e. non-SSU gene), rather than be mis-oriented.
-Mike
3 Likes