hi,
My supervisor did some analysis of 16S samples (V3V4 regions) through DADA2 but we ran into some doubts about the filtering steps (merging & chimera removal).
BYLN3-demux.qzv (290.8 KB)
The forward and reverse reads were 300bp each so after the trim/truncating there still was more than 20 bp overlap, considering that V3 V4 regions length is ~460bp.
The merging step always threw out a lot of sequences so tried different approaches to compare the results:.
Looking at the attached demux.qzv file, several combinations of parameters were used to compare outputs:
-
DADA2 on original demux file with trim/trunc values given
-
Trimmomatic was done before DADA2(used 515/ 806 primers) so little trim/truncating values
-
Trimmomatic was done before DADA2 AND same trim/truncate values given as #1 (according to demux.qzv)
-
Trimmomatic was done before DADA2 AND trim/truncate values given as #1 AND only forward reads were used
Resulting stats
1. DADA2 on original demux file with trim/trunc values given
qiime dada2 denoise-paired \
--i-demultiplexed-seqs BYLN3-pe-demux.qza \
--o-table BYLN3-table1mill_4.qza \
--o-representative-sequences BYLN3-rep-seqs1mill_4.qza \
--o-denoising-stats BYLN3-denoising-stats1mill_4.qza \
--p-trim-left-f 17 \
--p-trim-left-r 20 \
--p-trunc-len-f 280 \
--p-trunc-len-r 260 \
--p-n-threads 16 \
--p-n-reads-learn 1000000
sample-id | input | filtered | denoised | merged | non-chimeric |
---|---|---|---|---|---|
#q2:types | numeric | numeric | numeric | numeric | numeric |
AJM0303 | 60485 | 42289 | 42289 | 13901 | 8109 |
NEGATIVE | 17560 | 12439 | 12439 | 7900 | 6731 |
YAM0907 | 70721 | 50271 | 50271 | 19523 | 11069 |
YAM2511 | 66559 | 48029 | 48029 | 16292 | 9635 |
YAM2512 | 53449 | 39393 | 39393 | 14618 | 8459 |
YAM2513 | 54478 | 39873 | 39873 | 17145 | 8716 |
YAM2514 | 36772 | 26708 | 26708 | 11418 | 6967 |
YAM2515 | 73333 | 53219 | 53219 | 19119 | 11146 |
YAM2516 | 39504 | 27992 | 27992 | 10006 | 5943 |
merging step seems harsh
2. Trimmomatic was done before DADA2(used 515/ 806 primers) so little trim/truncating values
qiime dada2 denoise-paired \
--i-demultiplexed-seqs BYLN3_89-pe-demux.qza \
--o-table BYLN3_89-table.qza \
--o-representative-sequences BYLN3_89-rep-seqs.qza \
--o-denoising-stats BYLN3_89-denoising-stats.qza \
--p-trim-left-f 6 \
--p-trim-left-r 6 \
--p-trunc-len-f 280 \
--p-trunc-len-r 240 \
--p-n-threads 12 \
--p-n-reads-learn 1000000
sample-id | input | filtered | denoised | merged | non-chimeric |
---|---|---|---|---|---|
#q2:types | numeric | numeric | numeric | numeric | numeric |
AJM0303T | 50295 | 45121 | 45121 | 15133 | 7692 |
NEGATIVET | 14820 | 13346 | 13346 | 8824 | 7512 |
YAM0907T | 59786 | 53555 | 53555 | 21718 | 11831 |
YAM2511T | 55715 | 50494 | 50494 | 17619 | 11089 |
YAM2512T | 45798 | 41500 | 41500 | 15931 | 9004 |
YAM2513T | 46301 | 41903 | 41903 | 17548 | 8987 |
YAM2514T | 30966 | 28059 | 28059 | 12465 | 8089 |
YAM2515T | 61728 | 55987 | 55987 | 21750 | 13672 |
YAM2516T | 32818 | 29611 | 29611 | 10231 | 6455 |
Considering sudden increase of chimeric sequences being filtered out, it seems that Trimmomatic + trimming 6 seqs still don't successfully remove the primers. Why would this be?
3. Trimmomatic was done before DADA2 AND trim/truncate values given
qiime dada2 denoise-paired \
--i-demultiplexed-seqs BYLN3_89-pe-demux.qza \
--o-table BYLN3_89_3-table.qza \
--o-representative-sequences BYLN3_89_3-rep-seqs.qza \
--o-denoising-stats BYLN3_89_3-denoising-stats.qza \
--p-trim-left-f 17 \
--p-trim-left-r 20 \
--p-trunc-len-f 280 \
--p-trunc-len-r 260 \
--p-n-threads 12 \
--p-n-reads-learn 1000000
sample-id | input | filtered | denoised | merged | non-chimeric |
---|---|---|---|---|---|
#q2:types | numeric | numeric | numeric | numeric | numeric |
AJM0303T | 50295 | 40970 | 40970 | 17537 | 15965 |
NEGATIVET | 14820 | 12148 | 12148 | 11323 | 11085 |
YAM0907T | 59786 | 48949 | 48949 | 23985 | 21305 |
YAM2511T | 55715 | 46762 | 46762 | 20449 | 19092 |
YAM2512T | 45798 | 38209 | 38209 | 17984 | 16831 |
YAM2513T | 46301 | 38701 | 38701 | 19653 | 17490 |
YAM2514T | 30966 | 25864 | 25864 | 14448 | 13813 |
YAM2515T | 61728 | 51618 | 51618 | 24471 | 22849 |
YAM2516T | 32818 | 27118 | 27118 | 12057 | 11576 |
It seems that running trimmomatic AND giving trim/trunc values according to the original (no trimmomatic) demux quality results in the best among 1,2,3. How can this be? Isn't this being redundant and filtering the primers twice?
4. performed trimmomatic before dada2 AND used just the forward reads
qiime dada2 denoise-single \
--i-demultiplexed-seqs BYLN3F-single-end-demux.qza \
--p-trim-left 17 \
--p-trunc-len 280 \
--o-representative-sequences BYLN3F_3-rep-seqs.qza \
--o-table BYLN3F_3-table.qza \
--o-denoising-stats BYLN3F_3-denoising-stats.qza \
--p-n-threads 4 --p-n-reads-learn 1000000
sample-id | input | filtered | denoised | non-chimeric |
---|---|---|---|---|
#q2:types | numeric | numeric | numeric | numeric |
AJM0303fp | 50295 | 48071 | 48071 | 46185 |
NEGATIVEfp | 14820 | 14185 | 14185 | 14030 |
YAM0907fp | 59786 | 57099 | 57099 | 53256 |
YAM2510fp | 145001 | 139020 | 139020 | 133711 |
YAM2511fp | 55715 | 53193 | 53193 | 51268 |
YAM2512fp | 45798 | 43599 | 43599 | 41268 |
YAM2513fp | 46301 | 44034 | 44034 | 39938 |
YAM2514fp | 30966 | 29523 | 29523 | 27905 |
YAM2515fp | 61728 | 59101 | 59101 | 56242 |
It seems like that removing primers twice (once by trimmomatic and again with DADA2 parameters) give me the most sequences after all the filtering steps. I'm thinking that this approach is what I should proceed with my further analysis. Is this right?
Remaining seqs are ranked as follows:
4 > 3 > 1 > 2
I apologize for such a messy question.
Please let me know if you need more information.
Cheers,