Regarding the presence of primer sequences on reads after primer removal using cutadapt

microbiome_25 · November 21, 2024, 4:58am

Hello.
I would like to inquire about primer sequences that remained on reads after primer removal using QIIME2 cutadapt plugin.
I am using QIIME2 version 2023.9 to analyze bacterial 16S rRNA genes (V3-V4 region).
Below is the command I used for this process.

qiime cutadapt trim-paired \
--i-demultiplexed-sequences demux.qza \
--p-front-f CCTACGGGNGGCWGCAG \
--p-front-r GACTACHVGGGTATCTAATCC \
--p-discard-untrimmed \
--o-trimmed-sequences demux_trimmed.qza \

After conducting primer removal with cutadapt, I exported the sequences and searched for forward/reverse primer sequences in the forward and reverse reads.
I noticed that many forward reads still contained a lot of reverse read sequences as well as some forward primer sequences, and the same was true for reverse reads.

I think this issue in my data may be related to the default settings of cutadapt as noted in its documentation (User guide — Cutadapt 5.0 documentation).

By default, at most one adapter sequence is removed from each read, even if multiple adapter sequences were provided.

Then, I used the --p-times 2 option, as suggested in a previous post (The primers are still present after cutadpt - #3 by yuanyuan12543).

However, this did not work with my data because the --p-discard-untrimmed option causes reads with only one primer to be discarded.

Furthermore, I used the --p-anywhere-f and --p-anywhere-r options to remove all primer sequences in the reads; however, this approach did not work for my data, resulting in the removal of many reads.

It seems that I cannot manually remove those read sequences and reintegrate them into the QIIME pipeline for further analysis, such as DADA2.

I am concerned that these remaining primer sequences in the reads will interfere with further analysis, particularly when constructing ASVs using DADA2.

Could you please suggest a solution?

Additionally, I would like to inquire about the presence of primer sequences located in the middle of the reads, not at the 5' or 3' ends, especially in cases where forward primer sequences remain in forward reads.
Are these sequences typically considered PCR or sequencing errors, such as primer dimers, or could they represent biologically meaningful sequences?

Thank you very much for your support.

SoilRotifer · November 21, 2024, 2:32pm

Hi @microbiome_25,

I would suggest running cutadapt several times. For example:

Remove the primers as normal using --p-front-*, as we want to remove primers form the 5' end. Note keep --p-discard-untrimmed enable for this part.

qiime cutadapt trim-paired \
    --i-demultiplexed-sequences demux.qza \
    --p-front-f CCTACGGGNGGCWGCAG \
    --p-front-r GACTACHVGGGTATCTAATCC \
    --p-discard-untrimmed \
    --o-trimmed-sequences demux_trimmed_01.qza

Then take this output and proceed to removing the reverse compliment of the other primer from each read. Note we use --p-adapter-* as we are looking to remove the reverse compliment of the primer form the 3' end of the read. That is we remove the reverse primer form the forward read and then the forward primer from the reverse read. Note, I am not using --p-discard-untrimmed here as not all reads will likely have the primer sequence at the 3' end. So, we'll trim if they are there and ignore if not.

qiime cutadapt trim-paired \
    --i-demultiplexed-sequences  demux_trimmed_01.qza\
    --p-adapter-f  GGATTAGATACCCBDGTAGTC \
    --p-adapter-r CTGCWGCCNCCCGTAGG \
    --o-trimmed-sequences demux_trimmed_02.qza

Give this a try and let us know how it does.

-Mike

microbiome_25 · November 25, 2024, 10:03pm

Thank you very much for your reply.
I have resolved the issue of the reverse primer sequence being present in forward reads and the forward primer sequence being present in reverse reads.
Thank you very much.

I would like to ask two additional questions.

I have searched for previous papers on the gut microbiome using QIIME2, but couldn't find papers that mentioned using cutadapt's --p-adapter-f or --p-adapter-r options for the removal of reverse-complementary strand primer sequences.
Could you please let me know if it is common not to remove reverse-complementary strand primer sequences in reads?
When I executed cutadapt using the commands you provided, the following primer sequences still remained in reads: the forward primer sequence in forward reads and the reverse primer sequence in reverse reads.
Is it common to keep these reads with primer sequences due to the possible biological significance of the primer sequences present in reads, or should they be removed as errors?

I searched for primer sequences using the following commands in the Mac Terminal.
For the forward reads, I executed the following command.
grep -E 'CCTACGGG[ATCG]GGC[AT]GCAG' sequence.fasta

For the reverse reads, I executed the following command after converting the read sequences into their reverse complement.
grep -E 'GGATTAGATACCC[GCT][GAT]GTAGTC' sequence.fasta

Thank you very much for your support.

SoilRotifer · November 27, 2024, 8:51pm

I'd suggest reading the plugin help text and the cutadapt documentation regarding adapter types. There are also a few examples of this in the forum. See:

and

They should always be removed as the primers can make sequences appear more similar to each other then they actually are (and contribute to chimera checking problems). Remember that the PCR primers can bind with a reasonable amount of mismatches and become part of the sequenced amplicon itself. That is, due to the amplicon being PCR amplified each cycle and then finally sequenced. The area to which the primer binds will be the sequence of the primer itself, which may not necessarily be the actual sequence from the organism you are sequencing. Hence it is a good idea to remove the primer sequence prior to constructing OTUs/ESV, and phylogenies.

I am not sure why you are still observing primers.

microbiome_25 · December 3, 2024, 7:10am

Dear @SoilRotifer,
Thank you very much for your reply!
It was very helpful.
I am unsure why I am observing more than two primer sequences for some reads and whether this is indicative of errors.
Do you have any suggestions on this?
I have run the same cutadapt commands that you provided.

Thank you very much.

SoilRotifer · December 3, 2024, 2:11pm

I suspect, for some sequences it could be normal. That is it may look like primer sequence, but it is just a similar region. Unless it is an amplification / sequencing error. Perhaps manually run BLAST on these sequences and see how many contain "primer" sequence? Make sure you use megablast and check o exclude "Uncultured/environmental sample sequences".

system · January 3, 2025, 8:11pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.