Hi guys,
I've encountered problems when using reads from NovaSeq 6000 with dada2. The thing is that this system uses a different kind of quality score system:
To generate the Q-table for the NovaSeq System, three groups of base calls were determined, based on the clustering of these specific predictive features. Following grouping of the base calls, the mean errorrate was empirically calculated foreach of the three groups and the corresponding Q-scores were recorded in the Q-table alongside the predictive features correlating to that group. As such, only three Q-scores are possible with RTA3 and these Q-scores represent the average errorrate of the group (Figure 1). Overall this results in simplified, yet highly accurate quality scoring. The three groups in the quality table correspond to marginal (< Q15), medium (~Q20), and high-quality (> Q30) base calls, and are assigned the specific scores of 12, 23, and 37 respectively.* Additionally, a null score of 2 is assigned to any no-calls (source: Illumina)
This means my reads look like this:
@A01426:28:H7CTKDRXY:2:2101:4607:1047 1:N:0:CAGATCTG+GGAATCCG
ACGTCATCCCCACCTTCCTCCGAGTTGACCCCGGCAGTCTCCCACGAGTCCCCGCCATAACGCGCTGGCAACGTAGGATAAGGGTTGCGCTCGTTGCGGGACTTAACCCAACATCTCACGACACGAGCTGACGACAGCCATGCACCACCTGTACACCAACCACAAGGGAAGCACAATCTCTGGGGCTGTCTGGCGCAAGGCAGGCCAAGGGAAGAGTCTGCGCGGGGGGGCGAGGGGATGGACGGGGGGA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF::FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFF:FFFF,:F::FFFFFFFFFFFFF,F,,FFFFF,FFFFF,FFFFFFFF,F,FF:,,F,:,:,F,F,::,:,FF,FFF,,,,F,:,,,,F,:,F,:F:,FFF,:,F,F,F,,,,,FF,,,,,,,,F::,F,,,,
This causes serious issues with dada2 since it apparently does not like this sort of quality score system as it filters out around 99% of my reads in dada2 right away.
Any help how to process those QS within QIIME/dada2?
Cheers,
Max