Issue with Taxonomic Classification of 16S Sequences Using GreenGenes2: Need Help!

Issue with Taxonomic Classification of 16S Sequences Using GreenGenes2: Need Help!

Hi everyone,

I'm working with vaginal samples on the taxonomic classification of 16S sequences for the V1-V3 region using GreenGenes2 with full-length sequences to train a classifier. However, I'm encountering a peculiar issue. When processing 100 samples, I only get classifications at the bacteria level, and many ASVs remain unclassified.

I am using qiime2-amplicon-2024.5 (conda).

This is what I ran:

Blockquote

qiime dada2 denoise-paired
--i-demultiplexed-seqs demux-paired-end.qza
--p-trim-left-f 15
--p-trunc-len-f 295
--p-trim-left-r 15
--p-trunc-len-r 295
--o-representative-sequences rep-seqs.qza
--o-table table.qza
--o-denoising-stats denoising-stats.qza

qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads 2022.10.backbone.full-length.fna.qza
--i-reference-taxonomy 2022.10.backbone.tax.qza
--o-classifier classifier_full_length.qza

qiime feature-classifier classify-sklearn
--i-classifier classifier_full_length.qza
--i-reads rep_seqs.qza
--o-classification taxonomy_from_full_length.qza

Blockquote
I am attaching the taxonomy.tsv, taxonomy.tsv (2.9 MB)

Interestingly, when I perform the same process with just 2 of the samples, I get detailed classifications down to the species level, including Lactobacillus and other bacteria typical of vaginal samples, although I also obtained many ASVs up to the Bacteria level (attached taxonomy_2_samples.tsv.
taxonomy_2_samples.tsv (51.7 KB)).

I also tried directly with the pre-trained classifier specifically for V1-V3 with no success (for this strategy I was so confused about what primers to use...):

Blockquote

qiime feature-classifier extract-reads
--i-sequences gg2/2022.10.backbone.full-length.fna.qza
--p-f-primer 'GAGTTTGATCMTGGCTCAG'
--p-r-primer 'CCAGCAGCCGCGGTAAT'
--p-min-length 350
--p-max-length 650
--o-reads ref-seqs_gg2_V1V3_350_650_Allen.qza

Blockquote

The taxonomy table from this approach gives me almost all ASVs with this particular classification: d__Bacteria; p__Bacteroidota; c__Bacteroidia; o__Bacteroidales; f__Prolixibacteraceae; g__UBA6024; s__UBA6024 sp002429385, and the rest were classified up to Bacteria (
taxonomy_Allen.qza (1.4 MB)
)
I've done this procedure with other databases for the V3-V4 region without any problems. I'm not sure if the issue is because of the V1-V3 region or something else with the sequences or the procedure. The sequences seem to be of pretty good quality.

I'm not sure what's going wrong and would really appreciate any help in resolving this. Has anyone experienced something similar?

Thank you in advance!!

Dani

1 Like

Hi @Daniela_Vargas

It looks like you’re using the 27F and 534R primer set. You could try using the forward primer in combination with the reverse primer, 519R: GWATTACCGCGGCKGCTG as an alternative. This would pick out reads around 490-500bp, compared to your current primer pair which would select for reads around 517bp.

Howvever, when I did the extract-reads command using a specific primer pair, I didn’t include the min/max length commands. Maybe try doing it without those too and see if it improves things. Hope this helps!

1 Like

Thank you su much, Mike, I will give it a try!

1 Like

HI @Daniela_Vargas,

Can you re-run your original feature-classifier classify-sklearn command with the --p-read-orientation same option? The default for this is --p-read-orientation auto. For sklearn to work properly the reads must be in the same orientation as the reference database / classifier. Sometimes bad sequences can trick the orientation detector to map the reads as the reverse compliment, thus returning poor taxonomic assignment. It also explains why using different taxonomic results for your full data set vs your two same samples.

You can also sanity check this by using feature-classifier classify-consensus-vsearch, which is not dependent on orientation. But if you obtain reasonable taxonomy, then that suggests that sklearn got confused about the orientation.... or that your reads are in fact in mixed orientation (you can search the forum for more details on this. :slight_smile:

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.