Analysis with DADA2

Abul_Bashar · March 17, 2024, 7:37am

Dear All,

I am beginner in this platform and running some basic analysis in qiime2. However, I need your assistance in following aspects.

I amplified my target sequence with V3-V4 primer pairs (341F and 806R) and sequenced them in Illumina platform (2*250). All bases have a quality score (at 25th percentile) of more than 37 until 250th position. Before denoising with DADA2, I trimmed the primers (forward primer: 17 base, reverse primer: 20 base) both reads.

My target sequence should be 430 bp and If I keep the trunc length at 230 (forward) and 212 (reverse), nearly 10-12 bp should be overlapped and it results in more than 16000 features. However, if I increase the overlap (20-22 bp), keeping the trunc length at 230 (forward) and 222 (reverse), it comes up with less feature (13000). I wounder, why does allowing more overlap result in less features? Is this related to quality of reads or something else?

Another thing is that after merging the sequence with DADA2, the minimum sequence length is 385 nt and average length is 417 nt which are less than my expected length (430). What does this mean and how can I overcome with Qiime2?

I look forward to hearing from you and thanks in advance.

timanix · March 17, 2024, 8:05am

Hello!
Looks like you doing great by your own!

Let's try to clarify some things you mentioned above.

my target sequence with V3-V4 primer pairs (341F and 806R) and sequenced them in Illumina platform (2*250).

V3-V4 region is one of the biggest regions to work with and even if 2x250 is still OK I prefer to sequence it with 2 x 300.

Before denoising with DADA2, I trimmed the primers (forward primer: 17 base, reverse primer: 20 base) both reads

Here I would prefer to remove primers with cutadapt before running Dada2. Setting special parameter to discard all features in which no primers are found is especially useful since one can see if something went wrong immediately by the size of output file. Also, my intuition says that removing primers with cutadapt rather than trimming will results in more uniformed ASVs after Dada2.

My target sequence should be 430 bp

That's right but V3-V4 is not only one of the biggest regions to work with, but also varies in size between different bacteria.

If I keep the trunc length at 230 (forward) and 212 (reverse), nearly 10-12 bp should be overlapped and it results in more than 16000 features. However, if I increase the overlap (20-22 bp), keeping the trunc length at 230 (forward) and 222 (reverse), it comes up with less feature (13000). I wounder, why does allowing more overlap result in less features? Is this related to quality of reads or something else?

I think here you are getting lesser amount of features because of the combination of factors.
It may be, as you already mentioned, the quality of reads both at filtering step and in the overlapping region and the overall length of reverse reads.

Another thing is that after merging the sequence with DADA2, the minimum sequence length is 385 nt and average length is 417 nt which are less than my expected length (430). What does this mean and how can I overcome with Qiime2?

As I already wrote, V3-V4 region varies in length among different bacteria. Also, trimming primers may affect the length of the output.

To conclude, I would suggest following:

Decrease minimum overlap parameter in Dada2 to 6 since V3-V4 region is large.
Remove primers with cutadapt before Dada2.
Play more with Dada2 truncation and go for parameters that produce the highest % of reads passed through. Including the variant with disabled truncation if quality is good.

Best,

Abul_Bashar · March 17, 2024, 10:46pm

Thank you very much @timanix for your constructive response.

okk. I'll try this and play more with the trunc length to see how it comes up.

I wonder, if DADA2 hits sequence with shorter length after merging, what extent it could affect my result. Currently more than 50% hits are shorter than my expected length (430). In a forum discussion, @Mehrbod_Estaki suggested taxonom-based filtering to avoid any weird hits generated from shorter sequences. Do you think, it could be helpful in my case?

I have also noticed that some paper mentioned the varying length of V3-V4 region while using the same primers. This made me a bit confusing.

Thanks again!

timanix · March 18, 2024, 7:17am

I will try to calculate a little; please correct me if my assumptions are wrong.
So, you are working with V3-V4 primers (341F and 806R), and currently trimmed your sequences at positions 17 and 20. So, your expectation of 430 is based on the math: 806-341-17-20 = 428?

Here is the link to the comment of the dada2 developer. According to it, you should expect at least 2 groups of sequences, with one group shorter by approximately 20 nt than another group.
In addition, any reads that fail to merge will be discarded by dada2. You can also go for additional filtering based on taxonomy classifications if needed.

I was confused as well with my first dataset, which was also sequenced with V3-V4. Here is a nice graphical representation of V3-V4 region length.

system · April 18, 2024, 1:18pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.