My sequencing is on gene 16s rRNA with amplification on v3-v4 region.
Firstly i obtained this dada2 stats: stats-dada2-new.qzv (1.2 MB)
with trim left: f=9 r=2 trunc len: f=245 r=245
i noted that i lost a lot of samples with non-chimeric, in fact the percentage is around 20-30%. I think that is too low, right?
I read a lot on this forum and I tried to do the truncation of primer with cutadapt and then the denoising with: trim left: f=18 r=21 and trunc len: f=233 r=230.
(i think that i don't have so clear the meaning of trunc len parameter)
this is my stat dada after cutadapt and new parameters: stats-dada2-trimmed.qzv (1.2 MB)
Maybe something is better but i don't know if it is still too low to start my analysis (about 50% of non chimeric).
The 16S V3-V4 is a long region, so you will need all the quality you can get.
This is especially true for DADA2 paired, which requires the reads to join.
(DADA2 single only uses one read, so do not need joining.)
Here is DADA2 results for f=9 r=2 trunc len: f=245 r=245
I've sorted the table by percentage of input merged. ~40% is okay, through higher is better
trim remove bases from the start of each read (right side), while trunc len truncates the input reads at the end (right side). So here, the first 9 bases from R1 and 2 bases from R2 are removed, then both reads are cut off at 245 before joining is attempted.
Let's compare to these results: trim left: f=18 r=21 and trunc len: f=233 r=230.
These settings keep more reads, which is a very good sign!
Using shorter trunc len settings should cause more reads to pass filter, until they are too short then fewer will merge. It's a tradeoff and I often run this multiple times to find the sweet spot for my data.
So you said to continue with the second way:
trim left: f=18 r=21 and trunc len: f=233 r=230 (with trimmed demux.qza) but try with trunc len lower, right?
What I don’t understand is: how to choose trunc len? F and R should have the same number?
I read a lot of post where they calculate this parameter with the amplicon but I don’t understand how to do it!
Could you suggest other parameters? Maybe trunc len f 220 r 220?
Notice how the numbers always decrease as you read across the DADA2 stats table?
Each step removes some reads. Our goal is to find a combination of settings that 1) make sense biologically 2) preserve as much data as possible.
Once you decide the settings are 'good enough,' you can move to the next step.
Some of the first amplicon papers in this field only had 100s of reads per sample. That second result had mostly >10k reads per sample, which is good.
Ok, good! I understood the calculation
Yes, I tried with trim length F: 18 R: 21 and trunc len F: 220 R: 220.
I obtained this file stats-dada2-220.qzv (1.2 MB)
It seems to be better than the last one.
Then I tried with trunc len 170 and I obtained very low percentage, so I think that with 220 of truncation it’s ok.