Remove Primer in paired-end demultiplexed file

mohsen_ej · November 7, 2020, 12:42pm

Hi
I imported my paired-end demultiplexed data. now I want to denoise it but I don't know how to remove the primer.
I checked the forward fastq files, all of them start with CCTACGGG ( rarely sequences start with N)
on the other hand in the reverse files, I couldn't find any repeated sequence at the end of them. the first question is how to be sure I found the primer correctly?
secondly please let me know what is the best way to remove the primers? using dada2 or cut plugin? and should I remove primer from forward reads and reverse reads separately? because the demux file is for paired-end data.
I send the interactive quality plot for the paired-end-demux.qzv picture maybe it could be useful.
Thank you

Mehrbod_Estaki · November 7, 2020, 12:56pm

Hi @mohsen_ej,
You'll want to use the q2-cutadapt trim-paired plugin to remove primers from both your forward and reverse primers. Each should be on their respective 5' sites, this is why you don't see any repeated patterns on the 3' of your reads.
By default, your primers (if they are still intact) will be removed, and if they are not then nothing will happen. Those ambiguous N nts will be taken care of during denoising (as in reads with N in them will be dropped).

mohsen_ej · November 7, 2020, 1:36pm

Thank you.
does it find the primers automatically or I should give it the primers or something?
I did it by this command :
qiime cutadapt trim-paired \

--i-demultiplexed-sequences paired-end-demux.qza \

--o-trimmed-sequences paired-end-demux-trimmed.qza
I'm not sure if I did it correctly because after I convert it to a .qzv file I don't feel many changes in the interactive plot.
how can I know I did it properly or not?
also, is it possible to have 297nts in forward read but don't have it in reverse read? you can see that in the picture.

thank you

andrewsanchez · November 9, 2020, 7:19pm

Hi, @mohsen_ej!

Cutadapt won't know about your primers unless you specify them using the appropriate parameters. You will find the answer to your question by reading the help text for the cutadapt trim-paired command, which @Mehrbod_Estaki linked to above. You can also view this information by typing qiime cutadapt trim-paired --help in your terminal.

After removing your primers, you can then use qiime demux summarize to visualize the results.

Let us know how that goes!

mohsen_ej · November 9, 2020, 7:42pm

Thank you very much for your response.
as I am new to qiime
could you please give me an example about this?
you know, I have read the cutadapt help but I am not sure if I understood the issue correctly.
while the primers have used are
16S Amplicon PCR Forward Primer = 5'
TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG
16S Amplicon PCR Reverse Primer = 5'
GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCTAATCC
also it includes illumnia overhang adapter
Forward overhang: 5’ TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG‐[locusspecific
sequence]
Reverse overhang: 5’ GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG‐[locusspecific
sequence]
I'm not sure which parameters should I consider in the cutadapt command.
-pp-adapter or front or anywhere or ... how it will be
Really sorry if I'm asking simple question.

SoilRotifer · November 9, 2020, 8:08pm

Hi @mohsen_ej, if you search the forum, you'll come across many examples of how to use cutadapt. There are quite a variety of ways to leverage this tool.

You'll likely just want to specify your specific primer sequences not the entire construct. That is, your primers are likely these (anything after the ...GAGACAG):

--p-front-f CCTACGGGNGGCWGCAG \
--p-front-r GACTACHVGGGTATCTAATCC \

Here are a couple to get you started:

-Mike

mohsen_ej · November 9, 2020, 8:28pm

Thank you for your response.
That was great.
so you mean I don't need to consider overhang or something? just specific primer that is anything after GAGACAG in both reverse and forward reads?
I'm asking because I want to be sure I understood it.
Thank you

SoilRotifer · November 9, 2020, 8:30pm

Yes, see the example command options I provided.

mohsen_ej · November 10, 2020, 10:58am

Thank you I read them.
I ran this command :
qiime cutadapt trim-paired
--i-demultiplexed-sequences paired-end-demux.qza
--p-front-f CCTACGGGNGGCWGCAG
--p-front-r GACTACHVGGGTATCTAATCC
--o-trimmed-sequences paired-end-demux-trimmed.qza

how can I be sure that I did it correctly?
Thank you very much

SoilRotifer · November 10, 2020, 3:00pm

Hi @mohsen_ej,

Thank you for attaching the QZVs. You'll want to add the following flags, which were also mentioned in the posts I linked above:

--p-match-adapter-wildcards --p-match-read-wildcards --p-discard-untrimmed

You'll likely not need --p-match-read-wildcards, but it does not hurt to throw it in.

This will allow cutadapt to match the IUPAC codes in your primers (i.e. W, V N,...) with the reads, and discard any sequences in which it could not find both primers. The latter ensures you only have sequences that were trimmed.

So your full command should be:

qiime cutadapt trim-paired
    --demultiplexed-sequences paired-end-demux.qza \
    --p-front-f CCTACGGGNGGCWGCAG \
    --p-front-r GACTACHVGGGTATCTAATCC \
    --p-match-adapter-wildcards \
    --p-match-read-wildcards \
    --p-discard-untrimmed \
    --o-trimmed-sequences paired-end-demux-trimmed.qza

-Cheers!
-Mike

mohsen_ej · November 10, 2020, 3:30pm

Thank you @SoilRotifer for your helps and sorry if I'm taking your time with simple questions.
But as you can see there is still low score in the reverse read. do you think I can use the reverse read or its better to ignore that and continue with forward read? if I can use both of them, can I say :
--p-trim-left-f
--p-trim-left-r
--p-trunc-len-f 283
--p-trunc-len-r 256 \
does it make sense?
and one more thing, I didn't understand how did you identify specific primer (anything after GAGACAG ). why?
Really sorry for questions.

SoilRotifer · November 10, 2020, 4:26pm

Please search through the forum first. Many users have had similar issues with determining trimming and truncation settings. You may have to iterate through several settings.

That just came from my experience working with a lot of data sets. I've just became familiar with a variety of primer constructs and protocols. Ideally, you should always ask your sequencing facility which amplicon / gene region, i.e. PCR / sequencing primers, were used for your project. They should also provide you with a citation for these too.

-Mike

mohsen_ej · November 10, 2020, 4:33pm

Thank you for your information.
I asked because I found out there is no exact solution for this issue but I will read more.
many many thanks for your guidance.

gnanendra · November 12, 2020, 3:16am

Based on your product size (V3-V4 or V3 region or V4 regions), you can determine the trunc len-f and trunc-len-r. Next, it is quite common to have poor quality in reverse reads. So just use both tags as below

qiime dada2 denoise-paired
--i-demultiplexed-seqs demux.qza
--p-trim-left-f 10
--p-trim-left-r 10
--p-trunc-len-f 280
--p-trunc-len-r 200
--o-table table.qza
--o-representative-sequences rep-seqs.qza
--o-denoising-stats denoising-stats.qza

Best
Gnanendra

system · December 13, 2020, 9:16am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.