cutadapt cannot remove primer in the middle of reads

Hello everyone,

I apply qiime2 to my 16S rRNA sequencing data, which is an amplification of the V4 region with a primer pair of "GTGCCAGCMGCCGCGGTAA" and "GGACTACHVGGGTWTCTAAT". Whereas I use the cutadapt plugin to remove the primer with different commands below, some primers still remain in the middle of sequences.

The commands I have tested are as follows:

qiime cutadapt trim-paired --p-cores 20 --i-demultiplexed-sequences paired-end-demux.qza --p-front-f GTGCCAGCMGCCGCGGTAA --p-adapter-f TTACCGCGGCKGCTGGCAC --p-front-r GGACTACHVGGGTWTCTAAT --p-adapter-r ATTAGAWACCCBDGTAGTCC --o-trimmed-sequences paired-end-demux_de_primer.qza

qiime cutadapt trim-paired --p-cores 30 --i-demultiplexed-sequences paired-end-demux.qza --p-front-f GTGCCAGCMGCCGCGGTAA --p-adapter-f TTACCGCGGCKGCTGGCAC --p-front-r GGACTACHVGGGTWTCTAAT --p-adapter-r ATTAGAWACCCBDGTAGTCC --p-anywhere-f GTGCCAGCMGCCGCGGTAA --p-anywhere-r GGACTACHVGGGTWTCTAAT --o-trimmed-sequences paired-end-demux_de_primer.qza

qiime cutadapt trim-paired --p-cores 80 --i-demultiplexed-sequences paired-end-demux.qza --p-front-f GTGCCAGCMGCCGCGGTAA --p-front-r GGACTACHVGGGTWTCTAAT --p-match-read-wildcards --p-match-adapter-wildcards --o-trimmed-sequences paired-end-demux_de_primer.qza

The result is subjected to DADA2.

qiime dada2 denoise-paired --p-n-threads 20 --i-demultiplexed-seqs paired-end-demux_de_primer.qza --p-trunc-len-f 0 --p-trunc-len-r 0 --o-table dada2_table.qza --o-representative-sequences dada2_rep_set.qza --o-denoising-stats dada2_stats.qza

denoising statistics

qiime metadata tabulate --m-input-file dada2_stats.qza --o-visualization dada2_stats.qzv
qiime feature-table tabulate-seqs --i-data dada2_rep_set.qza --o-visualization rep-seqs.qzv

The length of merged sequences is supposed to be ~250 bp, but some merged sequences are longer than 300 bp. I found that the primer "GTGCCAGCMGCCGCGGTAA" still remained in some sequences, like the case below.

>b4559da283e0265b85157cd7532d5f37
TTAGAAACCCTTGTAGTCCATTGGCGTACG***GTGCCAGCCGCCGCGGTAA***TACGTAGAAGACTAGTGTTAATCATCTTTATTAGGTTTAAAGGGTACCTAGACGGTAAATTAAACTCTAAATGAGTACTTGTTTACTAGAGTTTTATGTAAGGAGGAAGAATTTCTGGAGTAGTGATTTAATATGAATAATCTCAGAGAGACTGGTAACGGCGAAGGCATCCTTCTATGTAAAAACTGACGTTGAGGGACGAAGGC

Can anyone provide suggestions on how to address this issue? Thanks.

Good afternoon,

Cutadapt has lots of features, and choosing the ones that are just right for your reads can be hard.

One option to try is --p-times, which is set to 1 by default, but can be increased for cases when the primer appears a second (or third??) time inside the read.

I try to work with the sequencing core to select cutadapt settings. They will know what to expect in the read and may already have a recommended way of removing unwanted primers.

1 Like

Hi Colin,

Thanks for the suggestion. I address this issue with the following command:

qiime cutadapt trim-paired --p-cores 80 --i-demultiplexed-sequences paired-end-demux.qza --p-anywhere-f GTGCCAGCMGCCGCGGTAA --p-anywhere-r GGACTACHVGGGTWTCTAAT --p-match-read-wildcards --p-match-adapter-wildcards --p-times 2 --p-minimum-length 180 --o-trimmed-sequences paired-end-demux_de_primer.qza

Besides, I am curious about how this happened. Are they chimera that DADA2 is unable to recognize?

Maybe a primer dimer? I don't see primers twice, but that option is there because it can happen.

Does adding --p-times 2 seem to help or do you get similar issues?