I’m pushing some data through the qiime2 pipeline and am met with an odd demux plot, one that makes it difficult to choose parameters for the dada2 step. I am missing most of the box part of the boxplot that every other demux.qza that I’ve seen has. The plot is generated using random sampling of 1000000 out of 19442585 sequences, and looks exactly the same as when I use 10000 sequences. I wonder if it might be due to Illumina’s quality score binning?
Indeed these quality scores look a but odd… I have seen this in a few different scenarios:
most commonly, if the data have already been quality filtered. Ideally, raw data should be passed to QIIME 2. I don’t think that’s the case with your data, though, since it looks like the reads are all the same expected length… I have seen this, e.g., when reads are trimmed where quality begins to drop.
I have seen just 1 or 2 datasets from Illumina’s never iSeq 100 system and it looked like this… is that by any chance what you are using?
Only a few QC steps were performed by the sequencing company, including throwing reads with too much adapter contamination. I don’t think this is the issue.
I’ve confirmed that our reads were sequenced with a Novaseq and indeed have only 4 quality scores, 2, 11, 25, 37. I’ve come across this issue raised on GitHub. @benjjneb recommends enforcing monotonicity in the fitted error model by changing the error model matrix. I can comprehend how to do this in R, but I am running dada2 in the qiime framework on the command line and don’t know how to intercept this matrix when running the qiime dada2 denoise-paired step. Perhaps the solution is to export the qiime object and and run the dada2 steps independently, then import the data once denoised.
Thanks for digging up the issue associated with NovaSeq from the dada2 github page!
QIIME 2 is just running a dada2 R script under the hood, so if you want to keep it all within QIIME 2 you could make a development branch of q2-dada2 and alter the R script to adjust the error model matrix inside that R script. Let me know if you want to go that route and have trouble locating the files that need to be modified and I can help you with that.
Otherwise it may just be simplest to process your data in dada2 directly in R, then import to QIIME 2 (see tutorial to do this below).
@benjjneb any other ideas? If this will become a common problem it might be neat to expose this functionality in q2-dada2.
Here is a tutorial for importing dada2 data from R into QIIME 2:
While we have the enforced monotonicity suggestion on the dada2 R package issues forum, we don’t have any official “production” solution for NovaSeq data yet, due to a lack of suitable test data from that platform.
DADA2 seem to work pretty well on NovaSeq data as is, but I am loathe to make official statements until I can more rigorously test (and potentially tweak) things.