high chimera rate in dada2

Hi Bod @Mehrbod_Estaki,

Thanks a lot for the explanation about Denoising. Following this section, actually I have a question about chimera.

In my dataset using DADA2 for denoising, over 90% of the merged ASVs are inferred as chimeras, which accounted for around 19% of merged reads. And I did check the reads, they didn’t contain any primer sequence.

In the DADA2 tutorial, developers mention that it’s reasonable majority of ASVs are inferred as chimeras as long as <25% of reads are chimeras. However, they didn’t expand too much on this point, eg why majority ASVs are chimeras are reasonable. May I have your comments on this?

3 Likes

Hi @Claire010,
90% is indeed high! You should check out these related forum topics for more discussion on how to troubleshoot and adjust:



Good luck!

2 Likes

Hi @Nicholas_Bokulich,

Thanks a lot for the posts. Actually I went through all of the posts related with chimeras on the forum. None seems to be similar with mine.

My situation is the majority of merged ASVs are inferred as chimera, but they account for 19% of merged reads. If talking about the proportion of chimera reads, actually it is not that super high, but definitely not that low.

And I confirm that all the non-biological sequences such as primer and adaptors have already been removed.

High possibly it is the library preparation resulted in a relatively higher proportion of chimeras rather than inappropriate data processing.

However, in my opinion, as long as the chimera doesn’t affect the microbial composition and the sequencing is deep enough to cover most of the diversity (Chao1 and rarefaction reached a plateau), we can still go on with the analysis after removing chimera.

Besides, this data comes from a deep sequencing on 16S V4 region with an average of 86,000±25,000 reads per sample even after removing chimera reads. I suspect the signal of chimera was enlarged a lot at such a deep sequencing depth.

That’s my opinion. Would be happy to have your comments.

PS: my only concern is whether a relatively high proportion of chimera (such as 20% of reads are chimeras) would change the bacterial community? I could not find a paper discussing on this. If there is, please correct me.

3 Likes

Oh I see, I think I misread this above… 19% is not super high, I agree you can probably just “move on” especially since you have very high read depth.

I agree, with the significant change that chimera is chimera, and (assuming the chimera checker is not grabbing false positives) should be removed whether or not it changes the results! Of course there’s the valid question of whether the chimera checker is accurate, and I cannot help there, I recommend looking at the benchmarks in the literature for that method if you want to assess overall accuracy…

Removing chimera would probably change the taxonomic results, but this is good and intentional if the chimera really are a PCR artifact and do not represent true diversity in your samples. So I recommend just “moving on” since your chimera rate is not all that high.

Good luck!