Issue with Taxonomic Classification of 16S Sequences Using GreenGenes2: Need Help!
Hi everyone,
I'm working with vaginal samples on the taxonomic classification of 16S sequences for the V1-V3 region using GreenGenes2 with full-length sequences to train a classifier. However, I'm encountering a peculiar issue. When processing 100 samples, I only get classifications at the bacteria level, and many ASVs remain unclassified.
I am using qiime2-amplicon-2024.5 (conda).
This is what I ran:
qiime dada2 denoise-paired
--i-demultiplexed-seqs demux-paired-end.qza
--p-trim-left-f 15
--p-trunc-len-f 295
--p-trim-left-r 15
--p-trunc-len-r 295
--o-representative-sequences rep-seqs.qza
--o-table table.qza
--o-denoising-stats denoising-stats.qza
qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads 2022.10.backbone.full-length.fna.qza
--o-classifier classifier_full_length.qza
qiime feature-classifier classify-sklearn
--i-classifier classifier_full_length.qza
--i-reads rep_seqs.qza
--o-classification taxonomy_from_full_length.qza
I am attaching the taxonomy.tsv, taxonomy.tsv (2.9 MB)
Interestingly, when I perform the same process with just 2 of the samples, I get detailed classifications down to the species level, including Lactobacillus and other bacteria typical of vaginal samples, although I also obtained many ASVs up to the Bacteria level (attached taxonomy_2_samples.tsv.
taxonomy_2_samples.tsv (51.7 KB)).
I also tried directly with the pre-trained classifier specifically for V1-V3 with no success (for this strategy I was so confused about what primers to use...):
qiime feature-classifier extract-reads
--i-sequences gg2/2022.10.backbone.full-length.fna.qza
--p-min-length 350
--p-max-length 650
--o-reads ref-seqs_gg2_V1V3_350_650_Allen.qza
The taxonomy table from this approach gives me almost all ASVs with this particular classification: d__Bacteria; p__Bacteroidota; c__Bacteroidia; o__Bacteroidales; f__Prolixibacteraceae; g__UBA6024; s__UBA6024 sp002429385
, and the rest were classified up to Bacteria (
taxonomy_Allen.qza (1.4 MB)
I've done this procedure with other databases for the V3-V4 region without any problems. I'm not sure if the issue is because of the V1-V3 region or something else with the sequences or the procedure. The sequences seem to be of pretty good quality.
I'm not sure what's going wrong and would really appreciate any help in resolving this. Has anyone experienced something similar?
Thank you in advance!!