Error rates could not be estimated (Novaseq dataset)

Mehrbod_Estaki · December 2, 2021, 7:42pm

Hi @ju4n_dc,
Having had a second look at your error I wanted to clarify a couple of additional things.
Your original error message:

This suggests that you don't have enough reads for DADA2 to build a reliable error model.

But this line suggests that DADA2 did in fact have access to enough reads as by default it only requires 1 million reads. So the fact that it wasn't able to move pass the error-building step is what makes me think the quality scores are not playing nicely with the model building. But this is speculation on my part! @timanix also raised the possibility of running out of memory because NovaSeq datasets are massive, so certainly something to consider even if you sort out the problem. My hunch is that at that step it is not a memory issue (yet) but something to consider later on.

The FASTQ format looks like the same with just 4 characters in the quality lines, but, I'm not sure what this looks like at the raw data in the sequencer itself. I have to think they have a different format in saving the raw binary files in the sequencer which they then convert to regular FASTQ. Otherwise, I don't really see where the "space saving" aspect comes in to play