Potential artifacts produced with DADA2 wrapped in qiime2-2018.8 - need advice

jcmcnch · December 6, 2018, 11:49pm

Hi folks,

My name's Jesse, and I'm working in Jed Fuhrman's lab at USC. My colleague Yi-Chun (@LivYeh) and I have been developing a protocol for qiime2 for analyzing the mixed 16s/18s amplicons that are produced by the 515Y/926R primer pair. In case anyone's interested, we've got a basic set of wrapper scripts for qiime2 and an associated protocol up online. This protocol includes a step that splits mixed amplicons into 18s and 16s pools using bbsplit.

In any event, we have been testing the performance of deblur/DADA2 on our samples which include 16s and/or 18s mock communities (a recent paper shows a good example of why these are useful for sequencing QC). Because we know the exact sequences of these mocks, we can test the performance of the two different algorithms. In general, both resolve our mocks exactly, and we have found that DADA2 seems to do so while retaining a lot more of the data (which makes sense since it corrects reads, instead of discarding as Deblur does). So, we're moving towards using DADA2 because it helps us keep more of the small but precious Eukaryotic fraction of our amplicons.

However, we have noticed that DADA2 sometimes produces what we think are artefacts. This doesn't happen very often, and seems to be mostly restricted to noisy 2x300bp data. What we're calling "artifacts" are ASVs with a single mismatch to the mock sequence (tested by BLASTing across the entire length). We don't think that these ASVs can be explained by sample bleedthrough because the rows that they are in are completely empty elsewhere in the OTU table, suggesting they are derived from the "parent" mock sequences. The strange thing is that it doesn't happen all the time - some runs are good.

Things we've tried that were unable to resolve this issue:

-Increasing the number of sequences going into the error model to the maximum number
-Joining the sequences prior to denoising (with VSEARCH)
-Reducing the trim lengths to the minimum 20bp overlap
-Running the same sample on a more recent version of DADA2 in R (1.9.2)

There was one situation where we could actually figure out what was going on. It was a run where we had two extra mock samples from another lane of the HiSeq instrument. So mocks from lane 1 were sequenced at depth x, and then these two extra mocks were sequenced at twice the depth (I believe approximately 100k and 200k, respectively). When analyzing both lanes together, we observed artifacts appearing in the higher-depth mocks. So, I just downsampled those mocks to the approximately same level as everything else and the putative artifacts disappeared.

In comparison, we've never observed deblur to make this kind of ASV call - it always captures the mocks exactly and any 1-mismatches to the mocks are probably derived from real seqs that are present in our actual environmental samples. These potentially spurious ASVs are a concern because they can be abundant (~3-5%) of sequences in some affected samples, and suggest maybe DADA2 can over-split on occasion.

Can Ben (@benjjneb) or anyone else offer advice on any other directions we can pursue to try and resolve these potential artifacts? I think the DADA2 version of q2-dada2 is only 1.6.0 in qiime2-2018.8 (the version we're currently using; I checked by going into the conda env and opening R) but based on our preliminary tests it doesn't seem to make a difference if we use a newer version. Or is there something that has changed in the latest versions that could solve this? The fact that we get similar results from the R version and qiime2 suggests to us that it might be something inherent to the algorithm/dataset and probably not qiime2 related.

Thanks a lot in advance for any help you can offer!

Jesse McNichol

benjjneb · December 10, 2018, 3:29pm

DADA2 is a bit different from other methods in that it learns its error model from the data you give it, rather than relying on a precomputed error model. This is useful because the amplicon sequencing error rates vary quite a bit between experiments, both at the PCR step and the sequencing step. The advantage of this is that DADA2 can more precisely model errors for any particular experiment, which helps with sensitivity versus the alternative of using overestimated error models that are chosen to be higher than what is seen in any experiment. The disadvantage is that when samples from different processing batches (where the "processing batch" refers to the PCR+sequencing) are mixed, the error model will be fit to the "average" between them and could lead to FPs in the batch with the higher error rates. This is why it is recommended to process different batches separately and then merge them together at the end.

I suspect that this is what you are seeing, and in particular this line suggests that pretty strongly:

That is, its not so much that the depth was different, but that the different processing batch had a somewhat different error model with higher rates than the error model that was learned from the rest of the data. Although fairly rare, I have also seen individual samples in experiments produce FPs in this way due to issues with the PCR in that sample (PCR errors are identifiable based on certain signatures in the errors made, specifically identical error rates between complement errors). In other situations, are the mocks that you add to your experiments generated alongside the rest of the samples at both PCR/sequencing steps? Or are they sometimes (or often) sequenced or PCR-ed separately? If that is the case, it is likely that the FPs you are seeing int he mocks actually won't be reflected in the true samples provided they were all processed identically.

Yes, while the plugin uses a slightly older version of the package, it is running native package code, so it is not an issue with a different in algorithm. However, there are some options that are available through the native R interface that aren't through the plugin. One clear option is to simply overestimate the error model somewhat (this is similar to how other methods use over-estimates of the error rates) using the inflateErr function, with the idea being that if samples derived from different batches are combined, its best to prevent FPs by overestimating the error rates.

jcmcnch · December 17, 2018, 6:12pm

Thanks for answering our questions, and for explaining more about how DADA2 compares to the other methods. To answer your question about the mocks - typically they are PCR'd in the same batch and sequenced alongside the real samples, so we think that real samples could be affected if we see issues with the mocks.

We looked into the runs where we saw these FP crop up, and in general it makes a lot more sense now. Both runs had been PCR'd under multiple conditions, which seems to fit with what you explained about how DADA2 calculates error models. The number of FP also was correlated with this - for example, in one run with 4 or 5 distinct PCR conditions, there were 32 FP sequences (the most we've seen but this run was unusual in that it was testing multiple PCR conditions), whereas another run with only two there were only 5 FP.

There are still a couple of things we can't explain though. For both of these runs, we've tried separating the samples by PCR condition to see if the errors go away. I would expect this to help based on your explanation as the subsamples should have a more similar error profile, but it seems to have little if any effect. For example, separately analyzing just the mocks (excluding environmental samples) that were amplified under the same conditions still yields many FP. Does this indicate that these errors are truly in the amplicons due to some PCR-based artifacts? Could it be due to low DNA input, leading to statistical over-representation of initial PCR amplicon errors? Or could there be something run-specific?

We have been thinking about checking for these errors in the raw sequences but I wanted to see what you thought first. Thanks also for your suggestion about using the inflateErr option. We haven't tested this yet, but can confirm that these FP's do not occur with deblur so it seems that overestimating the error model may be useful for these unusual situations where experimental conditions are not uniform.

Thanks again for your help and advice!

benjjneb · December 19, 2018, 2:30am

Tagging myself @benjjneb
Will get back to this soon.

FYI: I don't regularly monitor this forum directly. But I have "notifications" turned on, so if I'm tagged or replied to in a thread I get an email notification and will see it, and am pretty good about responding in a timely fashion. Otherwise I don't see it, and probably won't see the post (e.g. I wouldn't have seen the previous post if I hadn't randomly checked the forum after Greg's successful submission email).

jcmcnch · January 2, 2019, 10:14pm

Thanks @benjjneb for your response, and for the reminder to tag you from now on!

Looking forward to hearing what you think about these weird results. Cheers,

Jesse

benjjneb · January 7, 2019, 2:42am

Thanks for tagging me again! As I know now, tagging yourself does not generate a notification email, and thus I had forgotten about this over the holidays.

From what I read here you have done some reasonable investigation, and the obvious things I would also try (e.g. separating based on PCR treatment) did not resolve the problem.

At this point, would it be possible for you to share a subset of your data with me that I can look at myself? We are always interested in well-defined situations in which results are not as expected, which it sounds like you are seeing, and there are some investigations I can do via dada2/R that just aren't possible to (easily) describe via forum posts.

If you are able to share some example data, my email is benjamin DOT j DOT callahan AT gmail DOT com, and the ideal type of data would be a small-to-minimal set of samples (e.g. the mocks from one or two PCR/sequencing batches in which you are seeing these spurious samples) plus a (perhaps largely repeated) description of the anomaly you are seeing. My goal would be to reproduce this on my end and to interrogate further on what those anomalies could be (PCR artefacts? Unremoved chimeras? Misclassified sequencing errors? etc) using additional functionality available through the dada2 R package. Please let me know here if anything is unclear!

ps: @ebolyen @Nicholas_Bokulich I'm not sure if you have set policies about going outside (e.g. to email) for exchanging information on Q2 problems, but please let me/us know if there is a different mechanism that is preferred. I'll make any meaningful information available here, but will probably also open an issue at the dada2 GitHub site as well.

Nicholas_Bokulich · January 7, 2019, 3:14am

Thanks for asking. We do not have set policies for external support — plugin developers can provide support by whatever mechanism they prefer, though we always appreciate having answers posted on the forum when things go "off forum". @thermokarst @ebolyen anything to add?

Nicholas_Bokulich · June 29, 2024, 5:30am

3 off-topic replies have been split into a new topic: question about ASVs

Please keep replies on-topic in the future.