Issue with Taxonomic Classification Greengene2 for 16S V4 region - Unassigned data

Hello,

I am processing 16S rRNA V3-V4 data using QIIME2 2024.10 (miniconda qiime2-amplicon-2024.10). For the classification step, I am using GreenGene2 v2024.9 (2024.09.backbone.v4.nb.qza) with the qiime feature-classifier classify-sklearn command. However, the resulting taxonomy table/plot shows that most of the data is classified as "Unassigned." In some samples, up to 83% of the data was unassigned. For this I am using ASVs after Deblur.

However, when I tested the same reads with 2024.09.backbone.full-length.nb.qza, the data was successfully classified.

This same pattern occurred with other datasets denoised with DADA2. Additionally, these two datasets are not related.

The command I used:

  • With V4 classifier:
    qiime feature-classifier classify-sklearn
    --i-classifier 2024.09.backbone.v4.nb.sklearn-1.4.2.qza
    --i-reads rep-seqs-deblur.qza
    --p-n-jobs 6
    --o-classification taxonomy_gg24_v4.qza

  • With full lenght classifier:
    qiime feature-classifier classify-sklearn
    --i-classifier 2024.09.backbone.full-length.nb.sklearn-1.4.2.qza
    --i-reads rep-seqs-deblur.qza
    --p-n-jobs 6
    --o-classification taxonomy_gg24_full.qza

Can someone help me understand what might be causing this? Can I use the data classified with the full-length classifier?

print of taxonomy.qzv after V4 classifier

print of taxonomy.qzv after full lenght classifier

Hello! Welcome to the forum!
My guess is that V4 classifier was trained only on the V4 region of the 16S rRNA gene, while you are working with V3-V4 region, which is considerably longer and also exhibits higher variability.
Your results obtained with full length classifier are much more trustworthy since they completely cover region you targeted, so you can use it.

Best,

3 Likes

Hi!
Thank you for the warm welcome and for your explanation! That makes a lot of sense—I hadn’t considered that before.
I really appreciate your insight!

Best,

1 Like

Hi @Emilia,

What were the specific primers were used?

Best,
Daniel

1 Like

Hi @wasade,

Following Illumina's protocol, we used primers to target the V3-V4 region of the 16S rRNA gene:

  • Forward Primer: 5'-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG-3'
  • Reverse Primer: 5'-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCTAATCC-3'

Best,

1 Like

Thanks! Ya, I would not expect the V4 specific classifier to have representation for these data.

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.