Thanks for sharing the files!
I looked at the reads inside of rep-seqs_R1_R2_R3.qza
, and ran it through vsearch dereplicate.
As expected, all reads were unique.
Then I ran vsearch dereplicate again with --strand both
, which checks for identical reads in both directions (forward and reverse.)
This function found lots of hits in the reverse strand!
This means your input reads were in a mixed orientation.
classify-sklearn
assumes that all reads are in the same orientation.
To fix this issue, use the RESCRIPt action orient-seqs