High chimera percentage on certain samples

Peripatus · January 17, 2025, 2:14pm

Dear all,

I recently received data from a NextSeq1000 sequencing run that included16S-V3V4 amplicon samples either from feces (human) or whole insect guts. All libraries were prepared using the same lab protocols (primers, enzime, PCR conditions, etc), although the extraction method differed: a soil kit for the insect guts and a fecal kit for the fecal samples.

I removed adapters with cutadapt and sequencing quality look good. However, while analyzing the data in qiime2 I noticed a marked difference in the amount of chimeras detected by DADA2. Fecal microbiome samples had a lot of chimeras (up to 50% of the reads), while invertebrate gut samples had almost none.

Fecal samples:

Insect gut:

Does anyone have a good explanation for that? As I said, library preparation and bioinformatic processing were identical. Could it be related to the extraction method? Amount of template DNA (i.e.: bacteria) in the samples? Bacterial diversity?

It would be great to understand this a bit better!

Thank you very much!
Best,
Gui

SoilRotifer · January 17, 2025, 2:16pm

Hi @Peripatus,

I'd take a look into using the --p-min-fold-parent-over-abundance parameter as outlined here. I'd likely not suggest setting to higher than 8 or 16.

colinbrislawn · January 17, 2025, 4:39pm

Hello Gui,

Welcome to the forums

The other option is to simply leave it!

The argument here is that the samples may simply have more chimeric reads in them compared to samples, which DADA2 is finding and removing. I find that keeping setting consistent is usually defensible, as long as you have 'enough' reads in both cohorts.

I don't. The consensus, as I understand it, is that chimeric reads are a product of PCR amplification 8704952, so more PCR cycles lead to higher chimeric levels PMC6531881.

PMC3044863 claims "More similar 16S genes clearly form chimeras more readily," which makes sense. So I guess the question is not of total number of different microbes but how different these microbes are. If your fecal samples have many highly similar microbes, they are more likely to form chimeras.

Peripatus · January 17, 2025, 7:50pm

Dear Mike and Colin,

Thank you so much for the quick reply!

@colinbrislawn I suppose that's indeed a possibility, specially since in most samples I still have a good number of reads left

@SoilRotifer I tried changing --p-min-fold-parent-over-abundance to 8 and indeed got a massive decrease in chimeras!! Numbers became close to those in the insect gut dataset. Even using 4 already made a significant difference. Of course, I also got a strong increase in the number of generated ASVs, which makes sense, I guess, since we're leaving more reads in. My whole dataset is around 150 samples and ASV number went from ~2500 to ~14k when I tested this on subsampled data with 5k read-pairs per sample. This of course raises the question of whether those are real biological ASVs or undetected chimeras... but the links you showed suggest that it might be fine to use --p-min-fold-parent-over-abundance = 8, right?

Not easy to make these decisions, uh?

timanix · January 17, 2025, 8:03pm

Hello!
Please allow me to qiime in as well.
I would follow the recommendation of @SoilRotifer and then filter features based on prevalence and abundance. Usually, I remove features that found in less than 3 samples and with overall count less than 10. That will decrease the number of unique features. If recovered by tweaking Dada2 features are chimeras indeed, I would expect that both unique features and total feature count drastically decrease. If recovered ASVs are biological sequences, then the number of unique features should decrease while the total feature count should decrease only slightly.

Peripatus · January 19, 2025, 7:46am

Hi @timanix ,
Thanks for the suggestion! Just to check if I got it right, are you suggesting that I remove features that are simultaneously found in less than 3 samples AND with overall count less than 10? Or do I remove everything that fits one condition OR the other? i.e.: Should features that are found in a single sample be removed even if they have very high abundance in that sample?

timanix · January 19, 2025, 8:58am

These conditions are independent, so "OR".
For example, if I have 10 samples from one group, and all the lab work was done in the same way for all the samples, I would be very suspicious about the ASVs that were found in only one sample. Are they really biological sequences? Or some contamination?

system · February 19, 2025, 2:59pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.