Too many chimeras in my samples?

Hi all

I have been looking at the first sequencing run of my data which comes from chronic wound infection swabs. The data is Illumina 2x300 paired end reads and samples were prepared following the human microbiome project protocol using primers 515F and 806R. As far as I know there has been no QC performed by the sequencing center but I did receive the reads demultiplexed. From the summary statistics I used the following commands with Dada2:

qiime dada2 denoise-paired --i-demultiplexed-seqs FUMID-demux-1.qza --p-trim-left-f 19 --p-trim-left-r 20 --p-trunc-len-f 290 --p-trunc-len-r 255 --o-table FUMID-table-1.qza --o-representative-sequences FUMID-rep-seqs-1.qza --o-denoising-stats FUMID-denoising-stats-1.qza --p-n-threads 20 --verbose

FUMID-demux-summary-1.qzv (304.9 KB)

From the denoising stats I noticed that I was losing a lot of reads during the removal of chimeras.

Denoising stats pool 1 255bp.tsv (7.7 KB)

Initially I though this might be down to not trimming my reverse reads enough to remove all the poor quality bases. So I repeated the denoising with the following commands:

qiime dada2 denoise-paired --i-demultiplexed-seqs FUMID-demux-1.qza --p- trim-left-f 19 --p-trim-left-r 20 --p-trunc-len-f 290 --p-trunc-len-r 200 --o-table FUMID-table-1.qza --o-representative-sequences FUMID-rep-seqs- 1.qza --o-denoising-stats FUMID-denoising-stats-1.qza --p-n-threads 20 -- verbose

And got this out:

Denoising stats pool 1 200bp.tsv (7.7 KB)

So I guess my question is, do you think that I am losing too many reads during the chimera removal stage? If so, is there anything I can do to improve on this?
Let me know if I have left out some vital information :slight_smile:


This is more than I typically see, but chimera errors can be prevalent and depend largely on PCR amplification protocol etc. Notably, you do still have plenty of good-quality merged sequence after chimera filtering so I would personally just move on and accept that dada2 is probably correct. Your trimming parameters, etc, look appropriate, nor do I think that would be likely to result in chimera, so Iā€™m inclined to agree with dada2 on this one.

You can get a second opinion by using a different chimera filter (e.g., export your data and use another tool or use q2-vsearch to dereplicate your data and then remove chimera).

Good luck!

Hi @Nicholas_Bokulich

Thanks for your reply. Sorry I was called away to another project. I thought it was a bit strange as we are basically following the HMP protocol. The only difference is that we are using a touchdown PCR for the first 10 cycles as we were getting some non-specific banding, but only 25 cycles in total.

I now have all my data for this project so I reassessed the truncation perimeters based on all for sequencing runs which means I am now running:

qiime dada2 denoise-paired --i-demultiplexed-seqs FUMID-demux-1.qza --p-trim-left-f 19 --p-trim-left-r 20 --p-trunc-len-f 250 --p-trunc-len-r 220 --o-table FUMID-table-1.qza --o-representative-sequences FUMID-rep-seqs-1.qza --o-denoising-stats FUMID-denoising-stats-1.qza --p-n-threads 20 --verbose

Looking at the statistics now I am loosing hardly any during the chimera step, literally >1000 reads for most of the samples (at least in the first pool).

Fingers crossed that the other 3 pools look the same :smile:

Not sure why there is such a dramatic difference though :thinking:


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.