Merging across multiple batches giving exorbitant feature founds

The vsearch / deblur approach is simply another denoising pipeline like DADA2. Though, how it denoises is different, e.g. using a premade error model, etc....

My understanding is that DADA2 denoises the forward and reverse reads separately prior to merging the reads. Meaning if one of the two reads is considered poor quality, etc., then the pair is discarded prior to merging. With the vsearch / deblur approach you have a chance of "rescuing" those pairs as the poor quality portion of one read can be corrected by the better quality bases of the opposite read when merging. You can read up on how vsearch merging works. You'll find some threads on the forum where visualizations of the vsearch merged reads show that the quality can increase in the region of overlap. Deblur will then denoise on the merged reads. You can not do this with DADA2 as it'd violates the error model.

Depending on the quality of the forward and reverse reads, sometimes deblur provides better results, other times DADA2 is better. So yeah, it may or may not be better. :man_shrugging:

Also, with deblur, you have to truncate to a fixed length, where as DADA2 allows length variation.

On another note, I often set the following for DADA2:

--p-pooling-method 'pseudo' \
--p-chimera-method 'pooled'

-Mike