Passing alternative parameters to DADA2 default plugin

jnesme · December 6, 2017, 10:35pm

Hi all,

Are you considering a more exhaustive parameters list to the current DADA2 denoise-paired plugin implementation?

Specifically, we're using 16S V3-V4 amplicons, sequenced with MiSeq v2 2x250bp so in average, the overlap region is rather small (i.e. 35bp in E. coli). When trimming low qual last bases, it's quite common then to drop below the 20bp mismatch.

We have however tried to modify scoring of the merger by strongly penalizing mismatch and achieved pretty good results, allowing to lowering min. overlap, using maxMismatch = 0 and still get a decent numbers of reads properly merged. We tested that on a previously published dataset and could verify that indeed this yields much more reads, without errors in the overlap.

This however needs to rewrite mergePair function to increase mismatch penalty to -6. I was wondering the following:
Could you please confirm me that there is currently no options to pass more arguments to DADA2 R functions in the QIIME2 implementation, especially mergePairs() and more specifically minOverlap?

Also, how easy would it be to modify the parameters set in getDadaOpt() because I could also change "MISMATCH" value there I suppose?...

By default, and even if I agree that throwing shitty reads is better, it also removes a bunch of reads that could otherwise be merged correctly, providing mismatch are heavily penalized during is nwalign()

But maybe I got all that wrong?

ebolyen · December 8, 2017, 11:34pm

Hey @jnesme!

There's been some talk about this on the q2-dada2 issue tracker. In particular turning the denoise-* actions into pipelines that use more granular actions (which would ideally have more parameters). And exposing minOverlap from mergePairs which has come up before on this forum before.

Right now there isn't a way to directly interact with the DADA2 R functions directly. Instead QIIME 2 passes the options available to an R script that @benjjneb very kindly wrote for us which then calls the DADA2 R functions in a pre-determined manner.

Looking at the docs, it sounds like you would use setDadaOpt for anything that wasn't specifically provided as an argument. However since q2-dada2 calls an R scripts, there's not a great way for you to jump into the R session and override anything. An .Rprofile file might make this possible, but you are really much better off using R and DADA2 directly.

benjjneb · December 9, 2017, 3:52am

This doesn't apply to the Q2 plugin right now, but your post is very timely. 3 days ago we checked in essentially the same change to the devel version of the dada2 R package, i.e. higher mismatch and gap penalties and a shorter minOverlap, for the same reason you are describing!

Those new and better defaults will propagate to the Q2 plugin, but perhaps not until they make it into the release version of the R package.

jnesme · December 11, 2017, 5:24pm

Great too here!

I tried manipulating run_dada_paired.R script and input the modifed function there and it seems to do the trick while waiting for the modification push to Q2 plugin

Thanks for your help and all the great work.

ebolyen · December 12, 2017, 2:59am

Hey @jnesme,

Glad to hear you were able to work around this! I just want to make a comment as a kind of informational note.

If you installed (and edited) via the git repo and ran make dev you should see a different version number when you type qiime info for q2-dada2. (Ideally something like 2017.11.0.dev0+0.gce22644.dirty.) This is also the version which is recorded in provenance, which could be useful to have long-term, in case you ever forget that you used a modified version of q2-dada2 in your analysis.

If you just edited the file in your environment directly (i.e. some file in site-packages/), then the version will still look like 2017.11.0 (or whatever release you might have) and may cause confusion later on as provenance can no longer "reflect" exactly what happened.