Changing the trimming length changes the microbiome taxonomic composition!!

Hello,
I am comparing the microbiome of group of samples using 2 different trimming length for (V3-V4):
#for quality score 20
qiime dada2 denoise-paired --i-demultiplexed-seqs demux-paired-end.qza --p-trunc-len-f 245 --p-trunc-len-r 245 --p-trim-left-f 17 --p-trim-left-r 21 --p-n-threads 3 --p-n-reads-learn 1000000 --output-dir representative_seq

#for quality score 30
qiime dada2 denoise-paired
--i-demultiplexed-seqs demux-paired-end.qza
--p-trunc-len-f 207
--p-trunc-len-r 217
--p-trim-left-f 17
--p-trim-left-r 21
--p-n-threads 3
--p-n-reads-learn 1000000
--o-representative-sequences rep-seqs-dada2.qza
--o-table table-dada2.qza
--o-denoising-stats stats-dada2.qza

By increasing the quality score, dominant taxa that were assigned to the family level, now they are assigned to the genus level, and the most dominant taxon that was assigned to the genus level, now is just identified to the family level.
However, by comparing to the cultivated microbiome, the results of QS 20 agreed with the culture collection of the same tissues.
I did this comparison to see the influence of QS threshold on identification of the microbiome taxonomy and need to know which one should I stick to (though QS30 is recommended)?
Also, from other projects, sometimes the QS of sequences is low and I have to trim at a lower level (20)!!

Thanks

1 Like

Also, one more thing; in the new tutorial to train the classifier to the amplicon region we use:

qiime feature-classifier extract-reads
--i-sequences silva_132_97_16S.qza
--p-f-primer GTGYCAGCMGCCGCGGTAA
--p-r-primer GGACTACNVGGGTWTCTAAT
--p-trunc-len 245
--p-min-length 100
--p-max-length 400
--o-reads ref-seqs.qza

I understand the provided lengths differ based on the amplicon length and I believe these numbers are okay when use V3-V4 primers??

But in previous tutorials:
trim reference database to amplicon region

qiime feature-classifier extract-reads
--i-sequences ref-seqs.qza
--p-f-primer CCTACGGGNBGCASCAG
--p-r-primer GACTACNVGGGTATCTAATCC
--o-reads silva_v132_97_341F-805R.qza

So, the old code gives more flexibility (trim based on primers' sequences without adding any contingency to the length of the trimmed classifier). I believe the change in taxonomy that I found in my samples may be due to the classifier trimming parameters?? May you please help me to understand.
Thanks

1 Like

Yep, changing the read length will alter taxonomic classification and more. There's no way to tell which is "correct" unless if you have a sample with known composition. You should check out the sequence yields from each to determine whether the Q20 or Q30 is causing too many sequences to be dropped, skewing the results. Also compare observed to expected results. You will need to evaluate this for yourself and decide what looks "right" — I do not know what results you are expecting.

You need to decide for yourself based on your expected amplicon length. The settings given in the tutorial example are based on V4, and are quite a wide margin anyway. So you will probably want to change these for V3-V4.

You can remove these parameters altogether if you don't want length-based filtering (but I would discourage this). The old tutorial was written before we added the length-based filtering option.

For all of this you will need to decide what makes sense for your protocol and for your experiment. Good luck!

2 Likes

Since the length of v3-v4 is 443, I trained the classifier using the following to get the ref-seq:
qiime feature-classifier extract-reads
--i-sequences silva_132_97_16S.qza
--p-f-primer CCTACGGGNBGCASCAG
--p-r-primer GACTACNVGGGTATCTAATCC
--p-trunc-len 445
--p-min-length 245
--p-max-length 445
--o-reads ref-seqs.qza

and the results are pretty similar to those when I used the old code where length-based filtering was not used. That is satisfying to what we found in vitro as well.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.