Dada2, trunc len

Linda_Abenaim · November 30, 2023, 4:31pm

Hello everyone!

I followed a lot of previous topics on this forum, but I need some help on my denoising.

here you can find my demux-summary
demux-summary.qzv (320.5 KB)

My sequencing is on gene 16s rRNA with amplification on v3-v4 region.
Firstly i obtained this dada2 stats:
stats-dada2-new.qzv (1.2 MB)
with trim left: f=9 r=2 trunc len: f=245 r=245

i noted that i lost a lot of samples with non-chimeric, in fact the percentage is around 20-30%. I think that is too low, right?

I read a lot on this forum and I tried to do the truncation of primer with cutadapt and then the denoising with: trim left: f=18 r=21 and trunc len: f=233 r=230.
(i think that i don't have so clear the meaning of trunc len parameter)
this is my stat dada after cutadapt and new parameters:
stats-dada2-trimmed.qzv (1.2 MB)

Maybe something is better but i don't know if it is still too low to start my analysis (about 50% of non chimeric).

Thank you so much for your help

colinbrislawn · November 30, 2023, 7:33pm

Hello Linda,

Your read quality looks great!

The 16S V3-V4 is a long region, so you will need all the quality you can get.
This is especially true for DADA2 paired, which requires the reads to join.
(DADA2 single only uses one read, so do not need joining.)

Here is DADA2 results for f=9 r=2 trunc len: f=245 r=245
I've sorted the table by percentage of input merged. ~40% is okay, through higher is better

trim remove bases from the start of each read (right side), while trunc len truncates the input reads at the end (right side). So here, the first 9 bases from R1 and 2 bases from R2 are removed, then both reads are cut off at 245 before joining is attempted.

Let's compare to these results: trim left: f=18 r=21 and trunc len: f=233 r=230.

These settings keep more reads, which is a very good sign!

Using shorter trunc len settings should cause more reads to pass filter, until they are too short then fewer will merge. It's a tradeoff and I often run this multiple times to find the sweet spot for my data.

Linda_Abenaim · November 30, 2023, 8:01pm

Thank you so much @colinbrislawn!

So you said to continue with the second way:
trim left: f=18 r=21 and trunc len: f=233 r=230 (with trimmed demux.qza) but try with trunc len lower, right?

What I don’t understand is: how to choose trunc len? F and R should have the same number?

I read a lot of post where they calculate this parameter with the amplicon but I don’t understand how to do it!

Could you suggest other parameters? Maybe trunc len f 220 r 220?

Linda_Abenaim · November 30, 2023, 8:04pm

Maybe do you wanted to said “ trim” left side and “trunc” right side ?

colinbrislawn · November 30, 2023, 8:47pm

Good catch! Yes, truncation happens at the end of the read, which is on the right for these graphs.

Here's my best write-up on how to calculate the expected overlap for an amplicon:

The truncation length of forward and reverse do not have to match.
(Because the quality of R2 is often lower than R1, people often trim more from R2.)

Yeah, try those!

If you go too short then the percentage of input merged will drop after running DADA2. You can see this in the stats.

Linda_Abenaim · November 30, 2023, 8:52pm

Thank you so much!! I will try

But what is a good percentage of input merged and of input non chimeric? Over 50?

If I have a good percentage of input merger but not of input non chimeric, what happens ?

colinbrislawn · November 30, 2023, 9:01pm

Notice how the numbers always decrease as you read across the DADA2 stats table?
Each step removes some reads. Our goal is to find a combination of settings that 1) make sense biologically 2) preserve as much data as possible.

Once you decide the settings are 'good enough,' you can move to the next step.

Some of the first amplicon papers in this field only had 100s of reads per sample. That second result had mostly >10k reads per sample, which is good.

Linda_Abenaim · November 30, 2023, 9:08pm

Thanks a lot

So if I understood the calculation:

If I use trunc Len 225 (f) and 225 (r) = 450
My amplicon v3v4 is 785-341= 444

So= 450-444= 6 overlap.

It is low?
Or the important thing is to have overlap and not go down (-6 for example)

colinbrislawn · December 1, 2023, 2:20pm

Perfect! If we trim more and joining drops, this length estimate can explain why.

Correct! This will help us estimate how short is too short.

(having 6 base pairs of overlap may already be too short)

Here's the twist: we don't know the true amplicon length because it varies between species. (It can also depend on sequencing primers.)

We don't know how short it too short until we run it and find out that fewer of our reads joined.
So try it and find out!

Linda_Abenaim · December 1, 2023, 2:42pm

Ok, good! I understood the calculation
Yes, I tried with trim length F: 18 R: 21 and trunc len F: 220 R: 220.
I obtained this file
stats-dada2-220.qzv (1.2 MB)

It seems to be better than the last one.

Then I tried with trunc len 170 and I obtained very low percentage, so I think that with 220 of truncation it’s ok.

What do you think?

colinbrislawn · December 1, 2023, 2:46pm

(170+220)-444 = -54
170+170-444 = -104
That's a bigger gap! I understand why it did not merge.

Sounds like 220 may be good enough!

You can keep trying more settings to maximize read preservation if you want.

Linda_Abenaim · December 1, 2023, 2:49pm

No, i made 170 (f) and 170 (r).

But I think that with 220 (f) and 220 (r) it’s the best one! I tried also 225 and 225 but the percentages were little bit lower.

So after 220 and 220 I tried little bit lower with 170 and 170 but I obtained very low percentages. So I think that with 220 I have the maximum.

Do you think that I have to try again?

system · January 1, 2024, 8:50pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.