Found contaminated negative controls - am I denoising correctly with DADA2?

Nicholas_Bokulich · October 8, 2018, 5:48pm

Hi @jnie93,
Welcome to the community

Let's start with this question, and some explanation:

Yes, these look fine to me.

That would be without primers included (as far as I know) — looks like you are using EMP format, so that should be correct, but just a caveat to keep in mind.

Yeah, those data look very noisy, but I think what you did is fine:

Doing so means that many more noisy reads are included, but dada2 is filtering these out. This is why you are seeing very high proportions (~75-90% of reads) being dropped at the filtering step (see the stats summary). However, you have very high sequence coverage for most samples, so not a problem. You are getting very good merging rates and usable sequence counts in the output.

Yes, I am afraid so, but do not despair:

That's in an ideal world... but there are other reasons for reads appearing in the negative controls (e.g., index hopping, cross-talk). However, given the number of reads in the negative controls I would bet that this is mostly due to cross-contamination.

Cross-contamination is a common issue... but how to fix this is an open area of research. This discussion may be helpful for you, and give you some ideas for strategies to address this issue in your data, rather than tossing out your data and starting again. It takes a very small amount of cross-contamination to mar a negative control... levels far below what it would take to impact the composition of a real sample (unless if it is a low-biomass sample). There are some tools out there for addressing contamination, but no easy solutions. Discuss with your PI... I think that tossing out your data would be rash (unless if you are sequencing low-biomass samples), but with reads counts on par with your real samples you should look very carefully to see what contaminants are present, whether they are cross-contaminants or reagent contaminants, and what next steps you want to take. I don't have easy suggestions, but will say that you are definitely not alone in this problem, and (unless you have low-biomass samples!) you should not just toss the data.

If you find other strategies for contaminant removal, please feel free to add to that discussion!

Good luck!