Error rates could not be estimated (Novaseq dataset)

Mehrbod_Estaki · November 26, 2021, 11:09pm

Hi @ju4n_dc,
This is a great question and one we are seeing more often on the forum, and I suspect we'll see more moving forward as NovaSeq and the binning quality scores become more common.

The q2-dada2 is just a wrapper of DADA2 and so will face the same issues as the native version. Using DADA2 for NovaSeq data is doable with some modification (see this discussion here) but that will require you to either use it natively in R, or create your custom branch of q2-dada2 with modifications mentioned in that post. That being said, the developer of DADA2 mentions in that thread that while things look ok, DADA2 has not been thoroughly tested with NovaSeq data, so just be mindful of that if you choose to go that way.
As for using Deblur instead, that is a bit different because Deblur actually uses a pre-trained model for denoising so it technically will not care for the quality scores at all. So, technically you will have no problem running this data through Deblur. That being said, Deblur's error model was based on the MiSeq and HiSeq machines and I'm not aware of any benchmarking of its use with NovaSeq (or other) sequencers. Given that the NovaSeq technology is claimed to have less error rates than MiSeq and HiSeq, I would speculate that applying the more conservative MiSeq/HiSeq error model is ok here, though a more customize model might result in higher # of reads.

The results are comparable , however, using the same bioinformatics pipeline would be more ideal and I would personally recommend that. I'll also add that as read lengths get longer (beyond 100-150 nts) DADA2 tends to retain more reads than Deblur due to the nature of the denoising algorithms (relevant explanation here). So consider sequencing depth as an important factor as well if you are going to mix and match denoisers.

I'd love to hear what others think on this topic, something we'll be facing more on the forum soon I'm sure.