Hello.
I would like to inquire about primer sequences that remained on reads after primer removal using QIIME2 cutadapt plugin.
I am using QIIME2 version 2023.9 to analyze bacterial 16S rRNA genes (V3-V4 region).
Below is the command I used for this process.
qiime cutadapt trim-paired \
--i-demultiplexed-sequences demux.qza \
--p-front-f CCTACGGGNGGCWGCAG \
--p-front-r GACTACHVGGGTATCTAATCC \
--p-discard-untrimmed \
--o-trimmed-sequences demux_trimmed.qza \
After conducting primer removal with cutadapt, I exported the sequences and searched for forward/reverse primer sequences in the forward and reverse reads.
I noticed that many forward reads still contained a lot of reverse read sequences as well as some forward primer sequences, and the same was true for reverse reads.
I think this issue in my data may be related to the default settings of cutadapt as noted in its documentation (User guide — Cutadapt 0.1 documentation).
By default, at most one adapter sequence is removed from each read, even if multiple adapter sequences were provided.
Then, I used the --p-times 2
option, as suggested in a previous post (The primers are still present after cutadpt - #3 by yuanyuan12543).
However, this did not work with my data because the --p-discard-untrimmed
option causes reads with only one primer to be discarded.
Furthermore, I used the --p-anywhere-f
and --p-anywhere-r
options to remove all primer sequences in the reads; however, this approach did not work for my data, resulting in the removal of many reads.
It seems that I cannot manually remove those read sequences and reintegrate them into the QIIME pipeline for further analysis, such as DADA2.
I am concerned that these remaining primer sequences in the reads will interfere with further analysis, particularly when constructing ASVs using DADA2.
Could you please suggest a solution?
Additionally, I would like to inquire about the presence of primer sequences located in the middle of the reads, not at the 5' or 3' ends, especially in cases where forward primer sequences remain in forward reads.
Are these sequences typically considered PCR or sequencing errors, such as primer dimers, or could they represent biologically meaningful sequences?
Thank you very much for your support.