Dear qiime community,
I have recently had some issues concerning the removing of primers and the following dada2-denoising. I searched in the forum for similar issues (to not be repetitive) and I found many concerning dada2-filtering read loosing and others about cutadapt, but alone (I have like a problem with the combination of these steps).
I am working with V3V4 paired-end reads. From my paired-end sequences (with no adapters) i removed the primers with the following command as suggested in Remove Primer in paired-end demultiplexed file - #12 by SoilRotifer
qiime cutadapt trim-paired
--i-demultiplexed-sequences paired-end-demux-A.qza
--p-cores 4
--p-front-f CCTACGGGNGGCWGCAG
--p-front-r NACTACHVGGGTATCTAATCC
--p-match-adapter-wildcards
--p-match-read-wildcards
--p-discard-untrimmed
--o-trimmed-sequences trimmed-demux-A.qza
Here, the trimmed demux.qzv (after cutadapt) trimmed-demux-A.qzv (321.7 KB)
I assume this step worked well, as I still retained the 91% of the starting Fw and Rv reads.
However, when performed the dada2-denoising i lost all the reads in the first filtering step. See denoising-stats-A-6.qzv (1.2 MB)
I've read in this forum and also stated in my own analysis that the keeping of reads in the filtering step has a lot to do with the truncation parameters. I assessed different truncations from the trimmed-demux based on the attached demux summary, but the read-killing in dada2 filtering was the same (or similar). I truncated the Fw reads at 237 and Rv at 227, 220 and 204.
Main question
Any idea about what could be happening? I could truncate even more the reads (as I still would have merging) as it is suggested Lost of data with dada2 - #14 by benjjneb. However, I don't understand why I would truncate more positions when they appear to be good quality in the attached demux-summary.qzv.
Other questions
I tried other ideas with fine results in this "dada2-filtering-overkilling" (not happening with these ideas), but which I assume are metodologically incorrect (if someone could help confirming it).
- Not using the cutadapt-trimmed-demux, but directly the paired-end-demux and trimming the primer positions in dada2 (truncation parameters based on paired-end-demux summary, not attached).
iime dada2 denoise-paired
--i-demultiplexed-seqs paired-end-demux-A.qza
--p-trim-left-f 17
--p-trim-left-r 21
--p-trunc-len-f 0
--p-trunc-len-r 225
--o-table dada2-A7/table-A-7.qza
--o-representative-sequences dada2-A7/rep-seqs-A-7.qza
--o-denoising-stats dada2-A7/denoising-stats-A-7.qza
--p-n-threads 24
This way those reads not starting with the primer are kept an trimmed and may result in junk sequences, DADA2 vs Cutadapt - #3 by Mehrbod_Estaki
- Running cutadapt without the flag --p-discard-untrimmed \
That would carry the same problem as idea 1 I think. However, I don't understand why this option has no issue with the dada2-filtering as the main idea does.
- Running the denoising directly from paired-end-demux (not cutadapt-trimmed) and trim only the first 5-13 positions (which are lower in quality). Nevertheless, there would still be 5-10 nucleotides from the primers, and that would be incorrect, right? Or that would not have an impact on the analysis otherwise?
I did more analysis, but I only attach the results that can help understanding the issue. The other options and parameters I assessed would only cause more confusion.
I'm sorry for the long post, but just wanted to make this post easy to understand for people helping me.
Thanks for the help