Dear QIIME 2 Community,
We are currently analyzing paired-end 16S sequencing data from the V3–V4 region using QIIME2 (version 24.10). We performed taxonomic classification using both SILVA 138.1 nr99 and GTDB_220 databases, which we pre-trained in a region-specific manner with qiime feature-classifier fit-classifier-naive-bayes
.
Here are the steps we followed in our QIIME 2 workflow:
!qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path ../FASTQ-Dateien
--input-format CasavaOneEightSingleLanePerSampleDirFmt
--output-path paired-end-demux.qza
!qiime cutadapt trim-paired
--i-demultiplexed-sequences paired-end-demux.qza
--p-front-f CCTACGGGNGGCWGCAG
--p-front-r GACTACHVGGGTATCTAATCC
--p-error-rate 0.05
--p-cores 30
--p-discard-untrimmed
--o-trimmed-sequences demux-trimmed-v3v4.qza
!qiime demux summarize
--i-data demux-trimmed-v3v4.qza
--o-visualization demux-trimmed-v3v4-summary.qzv
!qiime dada2 denoise-paired
--i-demultiplexed-seqs demux-trimmed-v3v4.qza
--p-trunc-len-f 250
--p-trunc-len-r 200
--p-n-threads 30
--o-representative-sequences rep-seqs-dada2-v3v4.qza
--o-table table-dada2-v3v4.qza
--o-denoising-stats stats-dada2-v3v4.qza
!qiime metadata tabulate
--m-input-file stats-dada2-v3v4.qza
--o-visualization stats-dada2-v3v4.qzv
!qiime feature-classifier classify-sklearn
--i-classifier ../Classifier/silva-138.1-ssu-nr99-classifier-v3v4.qza
--i-reads 16S-rep-seqs-v3v4.qza
--o-classification 16S-taxonomy-v3v4.qza
The Problem
When we inspect our taxonomic classification results, we mostly reach only genus level (Level 6) and occasionally species level (Level 7).
As a reference, we tested the Nextflow-based AmpliSeq pipeline, and using the same dataset, we were able to classify taxa down to subspecies (Level 9).
Our Suspicions
- Read quality & truncation in DADA2
Could the read length trimming in DADA2 be affecting the classification depth?
What is the best way to determine the optimal truncation length for DADA2? Is there an automated way to trim based on Phred scores instead of manual truncation? - Classifier training & database limitations
Are there any best practices for improving classifier accuracy to reach deeper taxonomic levels?
Would it help to use a different feature-classifier approach?
We have tested multiple parameter adjustments, but we are still struggling to improve our classification depth in QIIME 2 compared to AmpliSeq. Any suggestions or insights would be greatly appreciated!
Many thanks in advance!
Best regards