Hoping to clarify that DADA2's chimera filtering strategy is a denovo process; that is, there is no use of a set of reference sequence with which to check for chimeras? This older thread from @benjjneb seems to suggest as much.
Fair to say that both DADA2 and Deblur use denovo appproaches for chimera filtering?
I’m curious if users have explored any potential benefit of additional chimera filtering with the vsearch implementation of uchime_ref? Related: what’s the down side to using a reference library over de novo identification?
I usually do uchime de novo first, then uchime ref second. This remove the maximum possible number of chimeras.
The main issue with all forms of chimera checking is “What if my database doesn’t have the parents of my chimeras?” So de novo methods work well because they use themselves as their own reference, and ref methods work well as they use a large database of known real microbes as a reference.
Yes this is correct. It may be worth being aware that the plugin gives you the option of two types of chimera filtering: “consensus” does de novo identification in each sample, takes a vote across samples, and removes all ASVs identified as chimeras in a high enough fraction of the samples in which they were present. “pooled” just lumps all ASVs in the data into one big sample, and identifies and removes chimeras that way.
In our testing “consensus” performs better for typical datasets/workflows.
Also in my testing, additional reference-based chimera removal on top of the de novo removal isn’t a net positive when considering both sensitivity and specificity (i.e. there are some false positive chimera IDs at that step). However, my testing there has not been exhaustive.
Would ASV2 be dropped from the dataset entirely, or retained?
From what I can tell DADA2 retains all ASVs regardless of how many samples they are present in. Just wanted to double check that is the case, and that there isn’t a parameter that allows for this function to be turned on/off like in qiime filter-features.
Quick follow up for @benjjneb. Hopefully I’m interpreting the DADA2 manual section describing a consensus strategy properly: within QIIME2, by default, a consensus chimera process is invoked, and this is triggering the isBimeraDenovoTable function within DADA2.
I’m specifically curious about one line in that function: minSampleFraction = 0.9. Am I correct that by default, this argument is requiring that 90% of all samples in my dataset have that chimera for it to be removed?
So, for example, if I had 200 total samples being processed in a dataset, is a suspected chimera only removed if it is present in 180 samples?
If that’s the case, I’m wondering what led to requiring such a high threshold in your testing/development of the parameter. That seems to me to indicate that chimera formation is quite favorable in PCR? From what R. Edgar’s posted about chimeras, it seems more like 1-5%. Not saying he’s correct, just noticing a huge disparity in the parameter from what he’s stating.