question about ASVs

Encountering this after many years as I'm having similar issues with false positive ASVs with few mismatches. I was wondering if there's a GitHub thread where this was resolved, and if someone can please point me to it! Thanks :slightly_smiling_face:

Hi @pkalvap1 ,
Could you please clarify: what issues are you having? Could you please share some specific results to demonstrate what issues you see?

1 Like

Hi Nicholas, thanks a lot for the response! I have a defined community of 6 organisms' 16S amplicons (250 x 2, paired end), that I ran DADA2 on. I see 700 ASVs in the result where I am expecting only 6. It seems that these are false positives, but I want to doublecheck and understand what DADA2 parameters to use to prevent them from occurring.

I took this data and aligned it to a custom blast database with only my 6 organisms. Some ASVs are 1-5 mismatches away from the correct 16S but there are some ASVs with 40-60 mismatches that have significant counts (<10 K) in multiple samples. The read quality seemed ok and I used max_EE of 4 in DADA2.

Any insights would be helpful please!

Hi @pkalvap1 ,

Why are you changing this from the default value of 2? By changing this setting you are permitting more erroneous reads to pass the initial filter.

This is much higher than would be expected from sequencing or PCR errors. So I suspect that these are laboratory and/or reagent contaminants.

In that sense, they would not be false-positives, and probably not related to dada2 processing.

Did you sequence your positive and negative controls? These will allow you to identify suspected reagent contaminants.

1 Like

Hello! Thank you for the point that the higher mismatches could be contaminations, it seems very likely that is the case. I do get very different genera of organisms as the closest match when I blast with the new 16S ribosomal RNA sequences database option.

Also that is a great point you made about the Max_EE. We had increased it in the past when we were losing too many reads with poor quality sequencing data and didn't change it ever since.

I believe I misunderstood that maxEE means the maximum errors that would be merged to make ASVs but it seems that this parameter is about discarding reads that have higher average errors than maxEE. I will try putting it to the default of 2 and get back on this thread with the observations.

But I am still conceptually trying to imagine how this would help merge my ASVs with 1-5 mismatches to the known sequence.

It probably would not help with these. These might be genuine variants that are not reflected in your reference database (e.g., variants), other contaminants that happen to be close to your true strains, or frequent errors that dada2 cannot denoise because they are common (e.g., a PCR error that has been amplified many times).

Good luck!