Many chimeric reads after dada2, but only in some samples

Hi @alfanon,

Thanks for the follow-up.

A couple of things come to mind though none of them are particularly worrying in my opinion. First, the tail end of the reads (as is expected with Illumina) might just have been low enough in quality that the sequencer automatically excluded them, this wouldn't even need manual QC. I've also experienced this with host contamination of the reads. Once merged, you can blast a few of those shorter reads to see what they are hitting. If they are host, or some unknown target, we can filter them pretty easily if needed.

I think based on your trim/truncating parameters the minimum overlap required for DADA2 is preserved so that shouldn't be a problem. If you were worried about this though, you can always just use your forward reads which seem to be in great shape!

Aha! I think your deduction is spot on here. If I had to guess, I'd say this is it. The probability of chimeras forming in a pool of nucleotides and primers is certainly higher when there is no real target. I also kind of recall this was especially true if a high fidelity polymerase wasn't used for the PCR.
Overall, I don't think this is an issue of the DADA2 algorithm, but rather something from your preparation and even then I wouldn't worry about it and just carry on with your analyses, taking care to filter out those spurious reads. I'd be interested in what @Nicholas_Bokulich's thoughts are on this though.

2 Likes