Potential artifacts produced with DADA2 wrapped in qiime2-2018.8 - need advice


(Jesse M) #1

Hi folks,

My name’s Jesse, and I’m working in Jed Fuhrman’s lab at USC. My colleague Yi-Chun (@LivYeh) and I have been developing a protocol for qiime2 for analyzing the mixed 16s/18s amplicons that are produced by the 515Y/926R primer pair. In case anyone’s interested, we’ve got a basic set of wrapper scripts for qiime2 and an associated protocol up online. This protocol includes a step that splits mixed amplicons into 18s and 16s pools using bbsplit.

In any event, we have been testing the performance of deblur/DADA2 on our samples which include 16s and/or 18s mock communities (a recent paper shows a good example of why these are useful for sequencing QC). Because we know the exact sequences of these mocks, we can test the performance of the two different algorithms. In general, both resolve our mocks exactly, and we have found that DADA2 seems to do so while retaining a lot more of the data (which makes sense since it corrects reads, instead of discarding as Deblur does). So, we’re moving towards using DADA2 because it helps us keep more of the small but precious Eukaryotic fraction of our amplicons.

However, we have noticed that DADA2 sometimes produces what we think are artefacts. This doesn’t happen very often, and seems to be mostly restricted to noisy 2x300bp data. What we’re calling “artifacts” are ASVs with a single mismatch to the mock sequence (tested by BLASTing across the entire length). We don’t think that these ASVs can be explained by sample bleedthrough because the rows that they are in are completely empty elsewhere in the OTU table, suggesting they are derived from the “parent” mock sequences. The strange thing is that it doesn’t happen all the time - some runs are good.

Things we’ve tried that were unable to resolve this issue:

-Increasing the number of sequences going into the error model to the maximum number
-Joining the sequences prior to denoising (with VSEARCH)
-Reducing the trim lengths to the minimum 20bp overlap
-Running the same sample on a more recent version of DADA2 in R (1.9.2)

There was one situation where we could actually figure out what was going on. It was a run where we had two extra mock samples from another lane of the HiSeq instrument. So mocks from lane 1 were sequenced at depth x, and then these two extra mocks were sequenced at twice the depth (I believe approximately 100k and 200k, respectively). When analyzing both lanes together, we observed artifacts appearing in the higher-depth mocks. So, I just downsampled those mocks to the approximately same level as everything else and the putative artifacts disappeared.

In comparison, we’ve never observed deblur to make this kind of ASV call - it always captures the mocks exactly and any 1-mismatches to the mocks are probably derived from real seqs that are present in our actual environmental samples. These potentially spurious ASVs are a concern because they can be abundant (~3-5%) of sequences in some affected samples, and suggest maybe DADA2 can over-split on occasion.

Can Ben (@benjjneb) or anyone else offer advice on any other directions we can pursue to try and resolve these potential artifacts? I think the DADA2 version of q2-dada2 is only 1.6.0 in qiime2-2018.8 (the version we’re currently using; I checked by going into the conda env and opening R) but based on our preliminary tests it doesn’t seem to make a difference if we use a newer version. Or is there something that has changed in the latest versions that could solve this? The fact that we get similar results from the R version and qiime2 suggests to us that it might be something inherent to the algorithm/dataset and probably not qiime2 related.

Thanks a lot in advance for any help you can offer!

Jesse McNichol

(Ben Callahan) #2

DADA2 is a bit different from other methods in that it learns its error model from the data you give it, rather than relying on a precomputed error model. This is useful because the amplicon sequencing error rates vary quite a bit between experiments, both at the PCR step and the sequencing step. The advantage of this is that DADA2 can more precisely model errors for any particular experiment, which helps with sensitivity versus the alternative of using overestimated error models that are chosen to be higher than what is seen in any experiment. The disadvantage is that when samples from different processing batches (where the “processing batch” refers to the PCR+sequencing) are mixed, the error model will be fit to the “average” between them and could lead to FPs in the batch with the higher error rates. This is why it is recommended to process different batches separately and then merge them together at the end.

I suspect that this is what you are seeing, and in particular this line suggests that pretty strongly:

That is, its not so much that the depth was different, but that the different processing batch had a somewhat different error model with higher rates than the error model that was learned from the rest of the data. Although fairly rare, I have also seen individual samples in experiments produce FPs in this way due to issues with the PCR in that sample (PCR errors are identifiable based on certain signatures in the errors made, specifically identical error rates between complement errors). In other situations, are the mocks that you add to your experiments generated alongside the rest of the samples at both PCR/sequencing steps? Or are they sometimes (or often) sequenced or PCR-ed separately? If that is the case, it is likely that the FPs you are seeing int he mocks actually won’t be reflected in the true samples provided they were all processed identically.

Yes, while the plugin uses a slightly older version of the package, it is running native package code, so it is not an issue with a different in algorithm. However, there are some options that are available through the native R interface that aren’t through the plugin. One clear option is to simply overestimate the error model somewhat (this is similar to how other methods use over-estimates of the error rates) using the inflateErr function, with the idea being that if samples derived from different batches are combined, its best to prevent FPs by overestimating the error rates.