dada2 minFoldParentOverAbundance parameter and sequence yield

mentorwan · January 9, 2020, 6:46pm

A different question related to mock-20 data set run.

We run different settings to check if result is optimal. One thing we notice that in DADA2 run by QIIME2, non-chimera removal step has huge difference with one parameter minFoldParentOverAbundance = 1.0. Default using this parameter, we got 13291 reads are non-chimera. But if not using this parameter, we got 59097 reads are non-chimera. Do you know why it has huge differences?

Sorry to bother you so many times.

Nicholas_Bokulich · January 9, 2020, 7:17pm

Hi @mentorwan,
I have split this topic to a new topic, since this is a somewhat distinct question, and has more to do with dada2 than with the mock community data you are using.

the minFoldParentOverAbundance parameter can (quite sensibly) impact chimera detection, so changing this setting is often expected to lead to dramatic differences. See these forum topics for some more discussion:
https://forum.qiime2.org/search?q=min-fold-parent-over-abundance

I suspect increasing that parameter (as others have advised in the topics linked above) is probably the wisest choice in this scenario, as I doubt such a high proportion of reads are chimera in that mock community.

No worries, happy to help

mentorwan · January 9, 2020, 7:38pm

Just have a quick test. If change this from 1.0 to 2.0, it changes from 12974 to 78692 reads. But if no this parameter, it will have 59097. It seems too conservative by using 1.0 as parameter. and got some false negative. But without this parmaeter, I think we can got all 20 expected taxonomy back. Thanks!

system · February 10, 2020, 1:38am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.