Hi Community,
I would like to gain some advices on the use of cutadapt and DADA2. I read several similar posts on the forum but my situation seems a little bit different from the previous discussion. (Qiime version: amplicon-2025.7)
I accidentally process raw sequence of a metabarcoding dataset (COI region, amplicon length ~200bp; sequencing done on Illumina pair-end PE150) without removing primers. The subsequent analysis are smooth, until I realize I haven’t remove the primers; then I repeat the process from the beginning.
After removing primers using cutadapt, nearly all of the sequences were filtered in DADA2. Subsequent analysis is not possible.
Here is the script used in cutadapt:
qiime cutadapt trim-paired \
--i-demultiplexed-sequences subset1.qza \
--p-front-f GGWACWGGATGAACWRTHTAYCCHCC \. # Forward primer sequence from 5’-3’
--p-front-r GTRATDGCWCCDGCWARWACWGG \. # Reverse primer sequence from 5’-’3
--p-cores 10 \
--p-match-adapter-wildcards \
--o-trimmed-sequences subset1_trimmed.qza
I compare the summary demultiplexed sequence counts, forward/reverse counts of .qza with and without removing primers, they look identical. Hence, I think the problem isn’t associated with cutadapt.
Here attach the summary (.qzv) as reference, where primers were trimmed in “subset1_trimmed.qzv"
subset1_trimmed.qzv (348.2 KB)
subset1.qzv (332.2 KB)
Since the quality of bases look good, I haven’t trimmed any bases in DADA2, here is the script:
qiime dada2 denoise-paired \
--i-demultiplexed-seqs subset1.qza \
--p-trim-left-f 0 \
--p-trim-left-r 0 \
--p-trunc-len-f 150 \
--p-trunc-len-r 150 \
--p-n-threads 10 \
--o-representative-sequences rep-seq_subset1.qza \
--o-table feature-table_subset1.qza \
--o-denoising-stats stats_subset1.qza \
And here is the output of DADA2 with and without removing primers:
stats_subset1.qzv (1.2 MB)
stats_subset1_trimmed.qzv (1.2 MB)
Although in stat_subset1.qzv, most of the sequences were filtered, the remain sequences look having high quality, subsequent analysis were done smoothly with reasonable outcome in taxonomy analysis. However, in stat_subset1_trimmed.qzv, almost no sequences were left.
I tired adjusting the parameters in --p-trunc-len-f and --p-trunc-len-r; but seems it wasn’t the source of problem. May I have some advice on the step? Would trimming primers after DADA2 an feasible option for accurate taxonomy assignment?
Any help are highly appreciated. Thanks.