How to remove primer with different length

Lei · May 14, 2019, 4:00pm

Hi,
I am here again to ask more questions. I am new to Qiime2. Thank you for your patience to answer my questions.

I have used the following cutadapt to trim in the (primer+extra spacer N) in my sequence. The primer I used is 515f/806r.
cutadapt trim-paired
--i-demultiplexed-sequences demux-paired-end.qza
--p-front-f GTGCCAGCMGCCGCGGTAA
--p-front-r GGACTACHVGGGTWTCTAAT
--verbose
--o-trimmed-sequences trimmed_remove_primers.qza
I compared the interactive quality plot of demux-paired-end.qzv (before remove the primers using cutadapt) with trim_remove_primers.qzv (after remove the primers using cutadapt). I am wondering why there are so much differences between the two plots. Did I do the remove primers steps correctly? More specifically, my questions are:

Why after I remove primers, there are red box appeared in the plot. But before removing, all the box are blue (if I zoom in). I have read several post in the forum and I am be able to understand "Red box plots (and the associated red warning text) should only appear when your input sequences have different lengths".

Question: The sequencing facility told me that "The products (primers included) should be around 291-294 bp" . But why I saw all blue box in the "demux-paired-end" plot (means equal length?) Why there are red box appeared after I used cutadapt? Did I remove primer correctly?

Questions: The quality plot of "demux-paired-end" looked strange compared to other quality plot I saw in the forum. @thermokarst mentioned in this thread that this kind of quality plot with narrow band of high-quality scores seems already be filter? Am I right? But I did not do any filtering steps. The "demux-paired-end" file is the file which I imported in Qiime 2 using commend qiime tools import. Since our sequencing facility told me that they already remove barcode so I did not go through the demultiplexing step.

Questions: If the red box make sense to appeared after removing primers. Do I need to get rid of all those red box by using --p-trunc-len when doing DADA2 filtering to make the sequence equal length (I am sorry if my questions is so basic to you. I am really trying hard to understood how to select the correct parameter for the DADA2 and understand the interactive quality plot. I have read many posts in the forum but still confused. )

If the trim_remove_primers.qzv looks fine, what parameter I should select? The following is the parameters I am thinking.
--p-trim-left-f 0\ (Seems no need to trim since I ready trim the primer in the cutadapt steps)
--p-trim-left-r 0\ (Seems no need to trim since I ready trim the primer in the cutadapt steps)
--p-trunc-len-f 0\ (I did not consider remove the red box)
--p-trunc-len-r 230\ (I did not consider remove the all the red box)
--p-n-threads 0\ (all available core will be used which can reduce the running time)
--p-n-reads-learn (keep default)

image1792×599 96 KB
Though I saw some thread in the forum mentioned these,I still want to confirm my understanding:
A. DADA2 will automatically do the jion (merge) front and reverse sequence for us and the output should be the joined sequence.
B. In my case in the primer set is 515f/806r which should cover around 291 bp. Based on the Qiime 2 forum, I need 30bp overlap. So I need 291+30~320 bp after truncation. That is trunc-len-f + trunc-len-r>320. In my case, I do not need to worry about I will not have enough overlap sequence since both the reverse and front reads are larger than 240bp?
C. Do I need to keep all the sequence the same length before running DADA2? Based on my understanding, different sequence length can affect the ASV clustering processes??

Attached file:
demux-paired-end.qzv (290.3 KB)
trimmed_remove_primers.qzv (295.9 KB)

Thank you so much for your help. I really appreciate that!