NovaSeq6000 with new RTA3 quality scores and DADA2

mfbeuq · July 27, 2021, 7:02am

Hi guys,

I've encountered problems when using reads from NovaSeq 6000 with dada2. The thing is that this system uses a different kind of quality score system:

To generate the Q-table for the NovaSeq System, three groups of base calls were determined, based on the clustering of these specific predictive features. Following grouping of the base calls, the mean errorrate was empirically calculated foreach of the three groups and the corresponding Q-scores were recorded in the Q-table alongside the predictive features correlating to that group. As such, only three Q-scores are possible with RTA3 and these Q-scores represent the average errorrate of the group (Figure 1). Overall this results in simplified, yet highly accurate quality scoring. The three groups in the quality table correspond to marginal (< Q15), medium (~Q20), and high-quality (> Q30) base calls, and are assigned the specific scores of 12, 23, and 37 respectively.* Additionally, a null score of 2 is assigned to any no-calls (source: Illumina)

This means my reads look like this:
@A01426:28:H7CTKDRXY:2:2101:4607:1047 1:N:0:CAGATCTG+GGAATCCG
ACGTCATCCCCACCTTCCTCCGAGTTGACCCCGGCAGTCTCCCACGAGTCCCCGCCATAACGCGCTGGCAACGTAGGATAAGGGTTGCGCTCGTTGCGGGACTTAACCCAACATCTCACGACACGAGCTGACGACAGCCATGCACCACCTGTACACCAACCACAAGGGAAGCACAATCTCTGGGGCTGTCTGGCGCAAGGCAGGCCAAGGGAAGAGTCTGCGCGGGGGGGCGAGGGGATGGACGGGGGGA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF::FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFF:FFFF,:F::FFFFFFFFFFFFF,F,,FFFFF,FFFFF,FFFFFFFF,F,FF:,,F,:,:,F,F,::,:,FF,FFF,,,,F,:,,,,F,:,F,:F:,FFF,:,F,F,F,,,,,FF,,,,,,,,F::,F,,,,

This causes serious issues with dada2 since it apparently does not like this sort of quality score system as it filters out around 99% of my reads in dada2 right away.

Any help how to process those QS within QIIME/dada2?

Cheers,
Max

timanix · July 27, 2021, 7:44am

Hi @mfbeuq ,
Recently I launched on the server a pipeline to process several dozens of 16s libraries that were sequenced with NovaSeq 6000. The only dataset in which we lost a lot of reads after DADA2 is one with bad quality in the middle of each pair of paired reads and we are tending to think that it is an errors introduced either on amplification or sequencing steps. With other datasets, we weren't losing a lot of reads and successfully merged 75-90% of raw input reads.
Could you share quality plots and commands you are runing?

BTW, you can try to merge your reads by VSEARCH plugin in Qiime2 and denoise it with DEBLUR plugin as workaround, which is less sensitive to quality scores.

system · August 27, 2021, 1:44pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.