The parameters for DADA2 denoise

Charlie · July 6, 2018, 12:14pm

Hello everyone:
My raw qzv of paired-end looks like this:

At the beginning of both read, the boxplots does not contain 'blue box', could anyone tell me what does this means and should I trim these regions?

The second question:
I did DADA2 two times, the parameter at first time:
--p-trim-left-f 8
--p-trim-left-r 6
--p-trunc-len-f 170
--p-trunc-len-r 160 \

If I want to include all the samples, the --p-sampling-depth is around 165.

But I think I may trimmed much sequence from right end, so I tried the second time:
--p-trim-left-f 8
--p-trim-left-r 6
--p-trunc-len-f 206
--p-trunc-len-r 188 \

However, with longer reads, the sampling-depth become shorter!

Then I tried:
--p-trim-left-f 8
--p-trim-left-r 6
--p-trunc-len-f 226
--p-trunc-len-r 208 \

This time, the sampling-depth is even less then 10.

I guess the bigger the sampling-depth is better if all samples included.

Can anyone tell me the appropriate trimming criteria for this samples?
Thank you!

Mehrbod_Estaki · July 6, 2018, 7:51pm

Hi @Charlie

That's actually a good question, I'm not sure what those negative spots are, they may just be there for aesthetics to make the graphs easier to read but they don't have any boxes because they are not true positions so have no quality scores. Until we get clarification what those are let's just ignore them for now. As for the few spots where there is no 'blue box' but there is a line, that's just because all the quality scores at that point are the same value so there is no variance in the 'boxplot'.

As for your inquiry about sampling depth, the choice of picking an appropriate sampling depth really depends on your data and question being asked. See this topic discussed here and here for example but I will say that all 3 scenarios you proposed for sampling depth above are too low.

You are right that the ideal situation is to retain the highest value of sampling depth while including all your samples but you will not have a true representation of a community with 160 sequences. How do the rest of your samples look as far as number of retained sequences after dada2? Are they all pretty low or only a few of them are that low? I suspect perhaps you are losing much of your reads after dada2 because they are failing to merge. What is your target region and what is the expected overlap region of your primer pair?

thermokarst · July 6, 2018, 8:04pm

Are you talking about the negative values on the x-axis? If so, that can be ignored, it shows up when you zoom on an ROI on the far left side of the plot, rest assured though, there are no negative positions!

Mehrbod_Estaki · July 6, 2018, 8:20pm

Thought so, thanks for the clarification @thermokarst!

system · August 7, 2018, 2:20am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.