Automating denoising step by dada2

uth · October 9, 2019, 4:23am

Hi,

I’m analysing multiple data sets belonging to different bio-projects (The goal is to do Taxanomic assignment) and therefore I need to automate my workflow. I’m stuck at the denoising step of DADA2 where the parameters for “trun length” are passed from the interactive quality plot that is generated at the demultiplexing step.

Is there a way to pass on the values for the parameters in the denoising step without manually interpreting the interactive quality plot?

Any advice is highly appreciated!

Thanks in advance!

jwdebelius · October 9, 2019, 7:40am

Hi @uth,

You can pick a default trim length and then just pass that parameter. I tend to go project-by-project and pick one i think makes sense, but you can just choose. When you pass the value into the command line, the computer doesn't know if you checked it manually, always use a standard, or this one came to you in a dream. (Although perhaps its better if not everyone has dreams.)

The reason to be cautious (and maybe a little bit conservative) with this is that if you pick a value you don't like in the end, you will need to re-process.

I tend to select a trim-length on a project-by-project basis, so, for instance, for a meta analysis, I'd select one trim length, and for a different, I might select another.

Best,
Justine

the_dummy · October 9, 2019, 11:32am

Well, I’m working on a project that I’m going to need to automatize denoising step some day, too. I have thought about it but I’m stuck at one point.
I’m working with paired end reads. Let’s say I want to truncate at the position where the quality goes below 20, but what if the reads doesn’t overlap? How can I get around this problem? I couldn’t figure this out, yet.
If someone has an opinion about this, please share. If I come up with a solution, I will definitely share with all of you.

ChrisKeefe · October 9, 2019, 7:36pm

@the_dummy, if your data is consistently exploring the same amplicons, you could probably script a solution that calculated a minimum trim-length for forward and reverse based on max target amplicon length, read length, and the need for 20+ characters of overlap. Additionally, I think DADA2 filter and trim has a parameter for truncating anything below a given quality score. Not sure how that works under the hood, but it might be worth looking into.

uth · October 10, 2019, 12:37am

Thank you @jwdebelius for your response. It will be much easier if we can pick the trunc lengths by executing a command that will choose the base with a low quality (e.g: quaity score below 20).

Anyways, I can try deciding a default trunc length and see if it works!

Thanks again for your comment!

Nicholas_Bokulich · October 10, 2019, 12:43am

two options:

dada2 can do something like this onboard with the --p-trunc-q option
see the q2-quality-filter plugin, which can perform trimming based on q score, then pass to deblur or dada2

uth · October 10, 2019, 10:49pm

Thank you for the reply @Nicholas_Bokulich. I’m trying to automate the parameter picking step (for bulk data analysis) to pass on to the denoising step, without visualising the interactive quality plot.

Any advice on that!

Many thanks!

Nicholas_Bokulich · October 11, 2019, 1:34am

so use either of the commands I listed above. This will allow you to trim your reads based on a quality threshold, instead of manually. You will probably lose some sequences that could be saved by manual selection of a truncation length, but you will make it up in the time you save, and create a rule that can be applied across runs.