I'm currently running the QIIME2-DADA2 (v. 2020.8) pipeline to process 16S paired end sequences from different hypervariable regions separately. Each was trimmed using Cutadapt and quality filtered and denoised with DADA2.
When I tried to train my own classifiers for each region, I was able to get very good feature classifications down to the species level for the vast majority of features for all regions except V1-V3. For the V1-V3 region, the vast majority are unassigned.
Just to give an idea of how badly classified: 1492/2723 are unassigned, 1225/2723 are classified to domain level, and only 6 total are down to the species level. So, essentially almost all are either unassigned or classified to domain level only.
I used the SILVA full length sequences & taxonomy files below from the Q2 data resources to train the classifier:
I followed the Q2 feature classifier tutorial to train the classifiers using primers 27FYM (5′-AGAGTTTGATCMTGGCTCAG-3′) and 519R (5′-GWATTACCGCGGCKGCTG-3′)
I've searched for issues related to unclassified features in this forum. I've tried/considered solutions from these similar posts, but they did not seem to resolve the issue I am having.
Specifically, I double checked that I was inputting the correct primer into the qiime feature-classifier extract-reads command and using dada2 paired end output. However, I have not yet tried the going back to rerun DADA2 denoising with different parameters, but this post mentioned it, but wondering if it would be worth trying in my case.
I did truncate the sequences from this region using the following parameters based on fastQC reports (screenshot below): --p-trunc-len-f 235 --p-trunc-len-r 190
Providing this summary as well in case it is helpful: 16S human oral samples from public database, MiSeq 2x300 paired-end, QIIME2 version 2020.8, V1-V3 primers (27FYM & 519R)
I am not sure what steps I should take to resolve this. Any advice/suggestions would be greatly appreciated. Thanks very much.