Qiime version: Qiime 2 2021.2
Accession on Qiime through terminal, using the MGHPCC
I am working with paired-end sequence data. I recently found some of my primer sequences within ASV sequences that had gone through the Qiime pipeline to completion. I had used the cutadapt function to trim and discard forward and reverse primer sequences towards the beginning of the pipeline (completed before denoising). One worry is that the presence of these primer sequences can skew alpha diversity metrics and comparisons across sample types in my dataset.
When I first noticed these primer sequences, I thought that I did not truncate my reads aggressively enough during the DADA2 denoising step. I went back and made my truncation parameters more stringent during the DADA2 denoising step and then visualized the sequences after this step. I still found forward and reverse primer sequences in a few hundred ASV’s.
Has anyone run into this issue? Are there more stringent parameters or functions available to cut primer sequences out of all ASV’s?
When running cutadapt did you set --p-discard-untrimmed? If not, there is a chance that, based on the sensitivity of the settings, that cutadapt did not detect the primer, when it is indeed still there. Thus, the primer does not get removed. I always set this parameter to ensure that I remove any sequences in which primers are not detected. It acts as a nice extra form of quality control.
In the command that I used to trim my 16S primer sequences, I do include --p-discard-untrimmed. I originally used --p-front-f and --p-front-r to indicate my forward and reverse primer sequences in the command.
Right now, I'm experimenting using commands similar to what was used in this post:
(including forward and reverse primer sequences, in addition to the reverse/complemented forward and reverse primer sequences. I anchored the forward and reverse primer sequences with a "^", but did not do so for the reverse/complemented primer sequences).
Now it is just a waiting game while dada2 runs. In the meantime, I'm open to any other suggestions to fix this issue. Thank you for your time!
Perhaps there is something here that'll help? Note, the comments about --p-match-adapter-wildcards probably do not help, as I think we set that as default in the later versions of .
Have you compared the trimmed sequence that still contains the primer with the original sequence? Perhaps there was a library preparation issue and/or sequencing error that resulted in multiple copies of the primer sequence? In which case you can set the --p-times 2.
Sometimes, there are a few almost identical sub-sequences within 16S that can make it appear that a primer is still present, when it is just a neighboring, yet very similar sub-sequence that is not actually the primer. Can you paste an example of a sequence (with the primer) before and after primer trimming, along with the primer sequence? I'd help to see the actual commands you're running too.
Thank you for your suggestions. I haven't yet compared the sequences before and after trimming - I'm working on generating a .qzv file with the sequences before trimming now so that I can make those comparisons.
In the meantime, here is the original command that I used for trimming primers:
When I ran the above command, I didn't run into any errors. I then generated quality plots and determined my truncation values to be 265 (forward reads) and 196 (reverse reads). From there, I ran dada2 and did not receive any error messages. However, there were no output files generated from dada2.
Using the original primer trimming command (first command written in 7/18/22 post), my forward primer showed up in 171 sequences and the reverse and complemented reverse primer was found in 2,420 sequences, all after denoising.
Using the primer trimming command in this current post that includes --p-times 2, the forward primer was found in only 4 sequences after denoising. No other primer sequences were identified. I'm happy with that result and am going to continue with my analyses from there.
I also ran the denoising command without primer trimming first in order to determine if there were multiple copies of the same primer sequence within one ASV. I did not see multiple copies of primers within single sequences, which was unexpected since the addition of --p-times 2 seemed to help my issue of incomplete primer trimming.
One important extra note is that according to the sequencing facility, all of my reads are 5'->3'. Although, the primer trimming command above resulted in primer trimming from both directions (5'->3' and 3'->5').
I believe this issue to be resolved. Thank you for your helpful suggestions, @SoilRotifer ! Take care