DADA2 pairwise alignments parameter tuning

I'm using DADA2 plugin in fungal ITS amplicon data. Reading the DADA2 paper I found that, in the first part of the algorithm (pairwise alignments) there are two options (KDIST_CUTOFF and BAND_SIZE) that control the heuristics of the alignments. In the paper they say that "default values should be re-examined if the algorithm is applied to genetic regions with significantly different characteristics, such as the indel-rich ITS region".

My question is: is there any way in QIIME2 to provide DADA2 with my own values of those two parameters?

2 Likes

Hello @salias,

Unfortunately, it doesn't look like those parameters are available through qiime2. If you think there's a strong argument for exposing these parameters you can open an issue here. The dada2 tool has so many parameters spread across its pipeline that when it was wrapped in qiime2 I think that there was a balance that needed to be struck between configurability and ease of use/interpretability. If you are comfortable with R, you can use the package directly and have control over any and all parameters that dada2 provides.

2 Likes

Hi @colinvwood ,

Thank you for your response. Yes, I think the same: there are so many parameters and exposing all of them would me a headache for the user. I really don't think these parameters are "important" enough for them to be available in q2-dada2 (because for example for 16S they seem to be sensible values already with the defaults), so I'll follow your advice and work with DADA2 in R and them importing to qiime2.

1 Like

hi @salias hi @colinvwood ,

On the other hand, if these parameters are important for ITS or other non-16S targets then it may be important to expose these to users but set them to reasonable defaults. Many users are using QIIME 2 to analyze ITS and other non-16S targets, so we need to consider their needs as well. So let's not rule this out just yet...

This sounds quite hypothetical — they should be re-examined, but it sounds like the appropriate settings are not known and would need to be explored. @salias if you plan to do so with your ITS data, perhaps you could report your findings back here? Then we could decide if these parameters should be exposed in q2-dada2. Ideally, some benchmarking should be done with a ground truth (e.g., mock community or other positive control) to tune these, if possible.

Thanks!

3 Likes

Hello @Nicholas_Bokulich ,

Thank you for your response.

That's true (I could serve as a example). The thing is that, as you say:

So maybe there is no need to fine-tune these parameters and we are just overthinking it. The only way to know if it is worth it is, as you say:

So what I could do is:

  1. Get mock data e.g. from mockrobiota, as they do in the Fungal ITS analysis tutorial
  2. Follow tutorial until the denoising step.
  3. Export sequences, then use DADA2 in R and try combinations of a range of values of KDIST_CUTOFF and BAND_SIZE.
  4. Go back to QIIME2, and do taxonomic classificiation for each test
  5. Evaluate accuracy and see if best combinations are different enough from default values

I'm currently focusing in my ITS QIIME2 Snakemake pipeline but I can spend some time to do the benchmarking and then share my findings here. If we spot some improvements by changing the default values of those parameters, I could even try to do a pull request to the q2-dada2 GitHub repository, although I would need to do some research on plugin creation, structure and philosophy.

Best wishes :cowboy_hat_face:

3 Likes

Sounds great @salias , when you find the time to test this please let us know what you find!

1 Like