Hello,
I am comparing the microbiome of group of samples using 2 different trimming length for (V3-V4): #for quality score 20
qiime dada2 denoise-paired --i-demultiplexed-seqs demux-paired-end.qza --p-trunc-len-f 245 --p-trunc-len-r 245 --p-trim-left-f 17 --p-trim-left-r 21 --p-n-threads 3 --p-n-reads-learn 1000000 --output-dir representative_seq
By increasing the quality score, dominant taxa that were assigned to the family level, now they are assigned to the genus level, and the most dominant taxon that was assigned to the genus level, now is just identified to the family level.
However, by comparing to the cultivated microbiome, the results of QS 20 agreed with the culture collection of the same tissues.
I did this comparison to see the influence of QS threshold on identification of the microbiome taxonomy and need to know which one should I stick to (though QS30 is recommended)?
Also, from other projects, sometimes the QS of sequences is low and I have to trim at a lower level (20)!!
So, the old code gives more flexibility (trim based on primers' sequences without adding any contingency to the length of the trimmed classifier). I believe the change in taxonomy that I found in my samples may be due to the classifier trimming parameters?? May you please help me to understand.
Thanks
Yep, changing the read length will alter taxonomic classification and more. There's no way to tell which is "correct" unless if you have a sample with known composition. You should check out the sequence yields from each to determine whether the Q20 or Q30 is causing too many sequences to be dropped, skewing the results. Also compare observed to expected results. You will need to evaluate this for yourself and decide what looks "right" — I do not know what results you are expecting.
You need to decide for yourself based on your expected amplicon length. The settings given in the tutorial example are based on V4, and are quite a wide margin anyway. So you will probably want to change these for V3-V4.
You can remove these parameters altogether if you don't want length-based filtering (but I would discourage this). The old tutorial was written before we added the length-based filtering option.
For all of this you will need to decide what makes sense for your protocol and for your experiment. Good luck!
Since the length of v3-v4 is 443, I trained the classifier using the following to get the ref-seq:
qiime feature-classifier extract-reads
--i-sequences silva_132_97_16S.qza
--p-f-primer CCTACGGGNBGCASCAG
--p-r-primer GACTACNVGGGTATCTAATCC
--p-trunc-len 445
--p-min-length 245
--p-max-length 445
--o-reads ref-seqs.qza
and the results are pretty similar to those when I used the old code where length-based filtering was not used. That is satisfying to what we found in vitro as well.