Trimming the first 5 low(er) quality bases from forward and reverse reads results in 25% more sequences removed as chimeras

ngarcia · February 28, 2020, 3:18pm

Thanks for your help!

I believe that all the non-biological sequence has been removed, but I left the biological portion of the primers. The forward reads start with 'CCTACGGGNGGCWGCAG' and the reverse reads start with 'GACTACHVGGGTATCTAATCC', which based on my understanding are the biological portion of the Illumina miseq V3V4 primers. What's odd is that while other people report removing the primers improved results, in my case it worsens chimera filtering.

Is there a good way to determine if the min-fold-parent-over-abundance default setting is non-optimal for a sample? Is stand alone dada2 the only way to manually inspect chimera detection? How do you determine if the sequences being filtered aren't actually chimeras? Recovering more reads isn't worth introducing a bunch of chimeras.