Dada2 filtering out >80% of reads as chimeras!

I don’t remember seeing a table or anything, but it appeared to run to completion (no error messages etc) and it took a few hours to complete, other than that I am not entirely sure… sorry! (it was a while ago and I am afraid I did not save it!)

The output (when run with --verbose) is quite lengthy from this plugin — maybe you could re-run the command and take a peek?

1 Like

Hi Ryan,

No problem I will when I get a chance (I’m currently progressing with annotating the forward reads, attempting to train a classifier using the SILVA database, which is proving challenging! I’ll report back after this is done)

Thanks again for all the help

Hi Sam,

in dada2, it is recommend to use a value greater than or equal to 1 for the commoan “–p-min-fold-parent-over-abundance FLOAT”. Thus can I ask the reason why you used a 0.75 for you data analysis?

I am not 100% understand the command, but please check this out: https://github.com/benjjneb/dada2/issues/602
Ben @benjjneb mentioned that a higher value (e.g. 4/8) could prevent the FP chimera.

And also please check my post: The meaning of DADA2 command "--p-min-fold-parent-over-abundance FLOAT"

Recently I run into the similar issue, the sequence after merging is still good, bot after the chimera check, it lost ~80%. I tried to trim more sequences off and use the “–p-min-fold-parent-over-abundance 8” ~80% sequences are retained. I am now doing the taxonomy assign. I will keep updating.

3 Likes

Hi Yaochun Yu,

I have the same issue, I am trying to trim more sequences using “–p-min-fold-parent-over-abundace8”, my question is does this increase the time required to more than one day in order to get the denoising stats?

Regards,
Maysa

Hi Yaochin,

Thanks for flagging this up, the main reason I altered “–p-min-fold-parent-over-abundance” was to test whether this parameter was responsible for my reads being discarded, by lowering the threashold. In the end, I did not use this line for the final analysis (and we have actually since repeated this experiment and have much higher quality sequencing coverage on some indepentant replicates of these samples), I did not appear to have the same chimera issue in the second dataset.

Hello Sam,

I am afraid what you are looking at is probably what you think, PCR artifacts caused by high number of cycles, low starting material and possibly contamination.

Some amount of chimeras inevitably arise in process of PCR and they compete with your sequences for amplification. Hence high starting material and low cycle advices.

The fast that first time this scheme worked also points to possibility of contamination of working environment. I also had times when PCR worked once but when I do same PCR I don't get what I need, only strange short fragments. It can be prevented by using filter tips and sticking to general PCR guidelines, but completely removing contamination is not always possible. Again, high starting material and low cycles reduce effect of contamination.

1 Like