I have a problem a problem with results of feature-classifier and can't figure out why is this happening.
When I'm analysing my three samples (R1, R2, R3) together (i.e. in one analysis workflow) my R3 sample is not properly classified (90% of R3 reads are classified as 'unassigned', R1 and R2 are totally fine). On the other hand if I analyse R3 sample alone, using the same workflow, then everything is classified properly and it looks similar to R1 and R2. What can cause this issue and how to fix this?
My workflow (I'm using cutadapt quality and length filtered reads):
I should've mentioned that, sorry. They are very similar (difference of ~1-5 %). I don't have .qzv files with me right now, but what I recall - for example for reads left after chimeras removal - in both analysis R3 values are around 74%.
Can you post the dada2 .qza files when you have a chance? I want to isolate the problem to the dada2 step or the classification step, and to do that I want to see if the .qza files match.
Thank you so much for your time. I attach .qza files.
As I've wanted to simplify sample names, I've named them Rx, but now it actually could become more confusing, sorry for that. Anyway, the name in the analysis of the problematic sample is V3-12-21-R23.
edit: I can't attach more than two files using forums 'upload', here is the link to the files: Nextcloud
I looked at the reads inside of rep-seqs_R1_R2_R3.qza, and ran it through vsearch dereplicate.
As expected, all reads were unique.
Then I ran vsearch dereplicate again with --strand both, which checks for identical reads in both directions (forward and reverse.)
This function found lots of hits in the reverse strand!
This means your input reads were in a mixed orientation. classify-sklearn assumes that all reads are in the same orientation.
To fix this issue, use the RESCRIPt action orient-seqs