I am processing 16S rRNA V3-V4 data using QIIME2 2024.10 (miniconda qiime2-amplicon-2024.10). For the classification step, I am using GreenGene2 v2024.9 (2024.09.backbone.v4.nb.qza) with the qiime feature-classifier classify-sklearn command. However, the resulting taxonomy table/plot shows that most of the data is classified as "Unassigned." In some samples, up to 83% of the data was unassigned. For this I am using ASVs after Deblur.
However, when I tested the same reads with 2024.09.backbone.full-length.nb.qza, the data was successfully classified.
This same pattern occurred with other datasets denoised with DADA2. Additionally, these two datasets are not related.
Hello! Welcome to the forum!
My guess is that V4 classifier was trained only on the V4 region of the 16S rRNA gene, while you are working with V3-V4 region, which is considerably longer and also exhibits higher variability.
Your results obtained with full length classifier are much more trustworthy since they completely cover region you targeted, so you can use it.
Hi!
Thank you for the warm welcome and for your explanation! That makes a lot of sense—I hadn’t considered that before.
I really appreciate your insight!