Hello! I am running the classification step on 16S rRNA data (V4 region) on some frog skin swab data using qiime2-amplicon-2024.10 (conda). I used the Greengenes2 pretrained classifier from the qiime2 website (2024.09.backbone.v4.nb.sklearn-1.4.2.qza), which I have successfully used on other datasets (tadpole gut microbiome, freshwater, etc) using the same version of qiime2. However, most of the reads showed up as unclassified or only classified to the kingdom level. Here is the code I ran:
qiime feature-classifier classify-sklearn
--i-classifier 2024.09.backbone.v4.nb.sklearn-1.4.2.qza
--i-reads qiime2_outputs/rep-seqs.qza
--o-classification qiime2_outputs/greengenes.taxonomy.qza
And a screenshot of the taxabarplot (set to level 5)
For context, SILVA (silva-138-99-515-806-nb-classifier.qza) successfully classified the same data just fine. After looking into it more, there were two things that changed between this analysis and previous ones I have done (aside from it being a different sample type). (1) We switched recently to the newest miseq i100 for sequencing. However, my labmate just used the same classifier on some bird data from this machine without issue. (2) I adjusted my cutadapt step to have H instead of N for the reverse primer:
Cutadapt code that resulted in poor classification with greengenes2, but worked with SILVA:
qiime cutadapt trim-paired
--i-demultiplexed-sequences qiime2_outputs/paired-end-demux.qza
--p-front-f GTGCCAGCMGCCGCGGTAA
--p-front-r GGACTACHVGGGTWTCTAAT
--p-adapter-f ATTAGAWACCCBDGTAGTCC
--p-adapter-r TTACCGCGGCMGCTGGCAC
--p-discard-untrimmed
--o-trimmed-sequences qiime2_outputs/paired-demux-trimmed.qza
I changed this step back to the old primer inputs and suddenly the classification worked using the same qiime2 version and pre-trained greengenes2 classifier.
Cutadapt code that worked with greengenes (reverting to N in --p-front-r and V in --p-adapter-f):
qiime cutadapt trim-paired
--i-demultiplexed-sequences qiime2_outputs/paired-end-demux.qza
--p-front-f GTGCCAGCMGCCGCGGTAA
--p-front-r GGACTACNVGGGTWTCTAAT
--p-adapter-f ATTAGAWACCCVNGTAGTCC
--p-adapter-r TTACCGCGGCMGCTGGCAC
--p-discard-untrimmed
--o-trimmed-sequences qiime2_outputs/paired-demux-trimmed_v2.qza
Screenshot of the taxabarplot (set to level 5)
What is confusing is that the denoising stats and rep-seqs files are fairly similar between the two runs. Sequence count, read lengths, and all percentiles are essentially the same (134,319 vs 134,360 total sequences; similar length percentiles; same min of 220 bp and max of 428 bp) and primers appear to be trimmed in both cases. The denoising stats are all within 5% of the number of reads retained. The feature sequences also appear to be similar when comparing the two rep-seqs.qzv files directly.
Really confused as to what went wrong – pretty new to microbiome analysis and would appreciate any insights into what might have happened.

