Regarding the presence of primer sequences on reads after primer removal using cutadapt

Hello.
I would like to inquire about primer sequences that remained on reads after primer removal using QIIME2 cutadapt plugin.
I am using QIIME2 version 2023.9 to analyze bacterial 16S rRNA genes (V3-V4 region).
Below is the command I used for this process.

qiime cutadapt trim-paired \
--i-demultiplexed-sequences demux.qza \
--p-front-f CCTACGGGNGGCWGCAG \
--p-front-r GACTACHVGGGTATCTAATCC \
--p-discard-untrimmed \
--o-trimmed-sequences demux_trimmed.qza \

After conducting primer removal with cutadapt, I exported the sequences and searched for forward/reverse primer sequences in the forward and reverse reads.
I noticed that many forward reads still contained a lot of reverse read sequences as well as some forward primer sequences, and the same was true for reverse reads.

I think this issue in my data may be related to the default settings of cutadapt as noted in its documentation (User guide — Cutadapt 0.1 documentation).

By default, at most one adapter sequence is removed from each read, even if multiple adapter sequences were provided.

Then, I used the --p-times 2 option, as suggested in a previous post (The primers are still present after cutadpt - #3 by yuanyuan12543).

However, this did not work with my data because the --p-discard-untrimmed option causes reads with only one primer to be discarded.

Furthermore, I used the --p-anywhere-f and --p-anywhere-r options to remove all primer sequences in the reads; however, this approach did not work for my data, resulting in the removal of many reads.

It seems that I cannot manually remove those read sequences and reintegrate them into the QIIME pipeline for further analysis, such as DADA2.

I am concerned that these remaining primer sequences in the reads will interfere with further analysis, particularly when constructing ASVs using DADA2.

Could you please suggest a solution?

Additionally, I would like to inquire about the presence of primer sequences located in the middle of the reads, not at the 5' or 3' ends, especially in cases where forward primer sequences remain in forward reads.
Are these sequences typically considered PCR or sequencing errors, such as primer dimers, or could they represent biologically meaningful sequences?

Thank you very much for your support.

1 Like

Hi @microbiome_25,

I would suggest running cutadapt several times. For example:

Remove the primers as normal using --p-front-*, as we want to remove primers form the 5' end. Note keep --p-discard-untrimmed enable for this part.

qiime cutadapt trim-paired \
    --i-demultiplexed-sequences demux.qza \
    --p-front-f CCTACGGGNGGCWGCAG \
    --p-front-r GACTACHVGGGTATCTAATCC \
    --p-discard-untrimmed \
    --o-trimmed-sequences demux_trimmed_01.qza

Then take this output and proceed to removing the reverse compliment of the other primer from each read. Note we use --p-adapter-* as we are looking to remove the reverse compliment of the primer form the 3' end of the read. That is we remove the reverse primer form the forward read and then the forward primer from the reverse read. Note, I am not using --p-discard-untrimmed here as not all reads will likely have the primer sequence at the 3' end. So, we'll trim if they are there and ignore if not.

qiime cutadapt trim-paired \
    --i-demultiplexed-sequences  demux_trimmed_01.qza\
    --p-adapter-f  GGATTAGATACCCBDGTAGTC \
    --p-adapter-r CTGCWGCCNCCCGTAGG \
    --o-trimmed-sequences demux_trimmed_02.qza

Give this a try and let us know how it does.

-Mike