Removing Primers: Cutadapt vs Denoising

elsamdea · August 3, 2021, 4:48pm

Hello everyone!!

Let me put you in the picture: I have started to work with qiime2 (version 2.4) a month ago. With respect to my data is a 16S paired-end sequences (obtained by Illumina sequencing) and I have to determine the taxonomy and the microbiota diversity in the different samples. However, in this moment, I am struggling with the denoised phase.

Even when I have read cutadapt documentation as well as dada2 documentation, I still have some dudes about:

In the q2-dada2 plugin, there is a parameter --p-trim, which is useful to eliminate base pairs of the 5' end.
But, in q2-cutadapt exits the command trim-paired or trim-single (the firts one is for paired end sequences and the other is for single-end sequences). Thanks to this command, you can remove the primers in your data.

So, my question is: if you can use directly the q2-dada2 to remove the primers base pairs, why in most of the papers suggest you use cutadapt before denoising?

I mean, the q2-cutadapt trim option requires the primer set sequences and this reduce the error. But, if you can remove them at the same time of the denoising...

Also, I know cutadapt let you working with specific hypervariable regions of the 16S rRNA bacterial gen (for example). So, if you only work with one region it is not indispensable to use cutadapt, right?

Perhaps there are some technical specifications that I have overlooked or am unaware of?

Thanks in advance!!

Elsa

SoilRotifer · August 3, 2021, 6:38pm

Hi @elsamdea

These are good questions! You are correct... technically you can go either way... that is use cutadapt, or just trim the primers out directly within deblur or DADA2.

I prefer to use cutadapt to remove primers for the following reasons, which I think I've echoed elsewhere in the forum a few times:

There may be spurious off-target sequences within your data. Just trimming will retain these reads.
PCR / sequencing errors can add or remove bases from the beginning or end. Thus potentially inflating differences between sequences creating more sequence variants in the output.
Quality at the beginning of the read is somewhat indicative of the quality later in the read. That is, if you cant find the primer... then what else is wrong with the sequence?

That is, using cutadapt to search through your reads, to find and remove the primers, is an additional form of quality control. That is, if you are unable to find the primers.. then chances are the reads are of low quality anyway and you might as well discard sequences from which you are unable to find the primers (e.g. --discard-untrimmed).

For example here are a couple forum threads you can read through:

But there is no one "right way" to do things here. I just prefer to be as exacting, and retain the best quality data I can.

elsamdea · August 3, 2021, 7:57pm

Hi again @SoilRotifer,

Thank you so much for responding so quickly and in such detail! The truth is that it is great to have a forum where you can expose your doubts and receive such complete answers. My congratulations to the moderators!

You are rigth. The quality control proccess should be as completed as possible. But I thougth that, maybe, I was adding redundant steps in the pipeline.

I admit I was wrapping my head around that. Thank you for suggesting these posts!! I am going to read these posts!!!

Also, I would like to ask another question related to this. In a primer trimming with cutadapt, you use --discard-untrimmed to discard all the sequences without primers add in the cutadapt command of your data. But, how are you sure this parameter is working well?

I mean, for example, if your original sample files show a high number of reads and your cutadapt files:

Show the opposite (poor number of reads in the count summary).
Your cutadapt reads number be too similar to the original?

In these two cases, the --discard-untrimmed wouldn't work because the filtering was too restrictive or too lax, right? Or I am wrong?

I have found this post in the forum: some questions about cutadapt demux-paired - #3 by LiuZjiia

And it was really useful, I will add the extra-parameters you suggest! But I still has the filtering question in mind.

Again, thank you so much again for your help!!

Elsa

SoilRotifer · August 3, 2021, 8:24pm

Thank you for the high praise! We do try our best.

Correct, often low-quality data will make some of these parameters too restrictive. Especially, if you are using the incorrect primer sequence(s) and/or orientation, as outlined in this thread. This is also why I also append the following commands (often only the first one is required, but I use both just because):

--p-match-adapter-wildcards \
--p-match-read-wildcards \

You're welcome!
-Mike