Benchmarking alternative methods of read-joining in QIIME 2

Mehrbod_Estaki · January 24, 2022, 6:22pm

Can you clarify what you mean by

Deblur operates on single-end reads utilizing a pre-trained error model, so if you were to merge your reads before providing it as input to Deblur it will just treat it as a regular "longer" sequence and denoise accordingly. In theory you can improve quality scores in the overlap region, and in fact some tools re-calculate the quality score when this happens, however this doesn't matter with Deblur because quality scores are not utilized for the denoising step. The other consideration is that Deblur does tend to become more conservative as read length increases (see example calculation here), so you will actually end up with less reads than if you were to just use the forward reads by themselves (assuming they are a bit shorter than merged). Intuition says you may gain slight improvements with taxonomic resolution with increase in read length, but to be honest I'm not sure I've actually seen this benchmarked anywhere, and when we're discussing short reads (aside from this old paper from 2007, Fig 1), I really don't think the difference between 150 vs 180 nt is going to have any noticable effects. At that point I think I'd prefer having more reads than longer reads but that is also very much so data-dependent.