Found contaminated negative controls - am I denoising correctly with DADA2?

Hope you’re all enjoying the weekend! I have a few QIIME2 questions - please forgive me if they are naïve/dumb – I’m very new to all of this. I just ran the denoising step with DADA2 and I’m worried that I have contaminants in my negative controls. I want to make sure that I used the right parameters in truncating and trimming before telling my PI we have contamination.

Process
We are using a 534R / 27F primer combo (V1-V3 primer). From my research online, I took this to mean that our amplicon is 507 bp (534-507) long. With 300 bp forward and reverse sequences, this should give us 93 bp overlap (600-507). I read that we should aim for 20nt+natural variation in our target for overlap and that we should truncate where the median quality score drops below 20.

However, when I looked at our interactive quality plot (demux.qzv), our median quality score dropped below 20 at positions 263 for forward and 202 for the reverse, respectively
• 263+202-507 = -38, which means we get no overlap
• To get some overlap, I lowered the median quality score cutoff to the point where we could get some overlap and ended up running the following command:
• qiime dada2 denoise-paired
• --i-demultiplexed-seqs demux.qza
• --p-trunc-len-f 300
• --p-trunc-len-r 245
• --o-table table.qza
• --o-representative-sequences rep-seqs.qza
• --o-denoising-stats denoising-stats.qza

When I looked at the table.qzv output, I saw that our negative controls had sequence counts greater than 0 for negative controls (NC.Kit4, NC.Kit5. Someone in my lab told me that we should have sequence counts of 0 for the negative controls after denoising, otherwise, this means we have contamination.

My questions are:

  1. Am I using the right parameters for denoising?
  2. Do I have contamination in my negative controls?

Thanks!

demux[1].qzv (295.2 KB)
table[1].qzv (536.9 KB)
rep-seqs[1].qzv (1.2 MB)
denoising-stats[1].qzv (1.2 MB)

Hi @jnie93,
Welcome to the community :houses: :smile:

Let's start with this question, and some explanation:

Yes, these look fine to me.

That would be without primers included (as far as I know) — looks like you are using EMP format, so that should be correct, but just a caveat to keep in mind.

Yeah, those data look very noisy, but I think what you did is fine:

Doing so means that many more noisy reads are included, but dada2 is filtering these out. This is why you are seeing very high proportions (~75-90% of reads) being dropped at the filtering step (see the stats summary). However, you have very high sequence coverage for most samples, so not a problem. You are getting very good merging rates and usable sequence counts in the output.

Yes, I am afraid so, but do not despair:

That's in an ideal world... but there are other reasons for reads appearing in the negative controls (e.g., index hopping, cross-talk). However, given the number of reads in the negative controls I would bet that this is mostly due to cross-contamination.

Cross-contamination is a common issue... but how to fix this is an open area of research. This discussion may be helpful for you, and give you some ideas for strategies to address this issue in your data, rather than tossing out your data and starting again. It takes a very small amount of cross-contamination to mar a negative control... levels far below what it would take to impact the composition of a real sample (unless if it is a low-biomass sample). There are some tools out there for addressing contamination, but no easy solutions. Discuss with your PI... I think that tossing out your data would be rash (unless if you are sequencing low-biomass samples), but with reads counts on par with your real samples you should look very carefully to see what contaminants are present, whether they are cross-contaminants or reagent contaminants, and what next steps you want to take. I don't have easy suggestions, but will say that you are definitely not alone in this problem, and (unless you have low-biomass samples!) you should not just toss the data.

If you find other strategies for contaminant removal, please feel free to add to that discussion!

Good luck!

3 Likes

Thank you so much @Nicholas_Bokulich!!! This is very helpful!

The problem is that we are analyzing low-biomass samples - we’re analyzing swabs of the skin microbiome. So in that case, would it make the most sense to start over with better technique?

My knowledge of skin microbiota is limited, but my understanding is that skin is not really a low-biomass environment (e.g. I see reports of ~10^4 bacterial cells/cm^2).

This is distinct from, e.g., healthy lung microbiota, which is what I would call low-biomass.

So obviously skin samples will be more sensitive to contamination than, e.g., stool samples, but I would not necessarily worry.

It may be worth assessing what types of contaminants are present, and whether you see possible reagent contaminants vs. cross-contamination from other samples. This could, at the very least, direct if/how you implement better techniques for preventing contamination — or help decide if you can salvage your current data.

Good luck!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.