I'm getting some weird errors when running qiime dada2 denoise-paired pipeline on a Novaseq dataset. As mentioned in this post, quality plots display a very rare pattern with quantized values, and the same is evident in the fastq file quality lines (all the quality scores read "F", ":" or ",").
About my dataset:
- 16S V3-V4 amplicon
- 250 b paired-end reads (Novaseq platform)
- 38 gut microbiota samples
- 23 M reads (600k per sample) dataset
Here I copy my command and the resulting errors:
qiime dada2 denoise-paired \ > --i-demultiplexed-seqs i04-demux.qza \ > --p-trunc-len-f 0 \ > --p-trunc-len-r 0 \ > --p-trim-left-f 0 \ > --p-trim-left-r 0 \ > --p-n-threads 6 \ > --o-table i04-table-dada2.qza \ > --o-representative-sequences i04-rep-seqs-dada2.qza \ > --o-denoising-stats i04-stats-dada2.qza \ > --verbose Running external command line application(s). This may print messages to stdout and/or stderr. The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist. Command: run_dada_paired.R /tmp/tmp7lorj9n3/forward /tmp/tmp7lorj9n3/reverse /tmp/tmp7lorj9n3/output.tsv.biom /tmp/tmp7lorj9n3/track.tsv /tmp/tmp7lorj9n3/filt_f /tmp/tmp7lorj9n3/filt_r 0 0 0 0 2.0 2.0 2 12 independent consensus 1.0 6 1000000 R version 4.0.3 (2020-10-10) Loading required package: Rcpp DADA2: 1.18.0 / Rcpp: 1.0.6 / RcppParallel: 5.1.2 1) Filtering ...................................... 2) Learning Error Rates 301902067 total bases in 1204373 reads from 2 samples will be used for learning the error rates. ^[ 301166572 total bases in 1204373 reads from 2 samples will be used for learning the error rates. Error rates could not be estimated (this is usually because of very few reads). Error in getErrors(err, enforce = TRUE) : Error matrix is NULL. Execution halted Traceback (most recent call last): File "/home/jdc/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/q2_dada2/_denoise.py", line 266, in denoise_paired run_commands([cmd]) File "/home/jdc/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/q2_dada2/_denoise.py", line 36, in run_commands subprocess.run(cmd, check=True) File "/home/jdc/miniconda3/envs/qiime2-2021.4/lib/python3.8/subprocess.py", line 516, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['run_dada_paired.R', '/tmp/tmp7lorj9n3/forward', '/tmp/tmp7lorj9n3/reverse', '/tmp/tmp7lorj9n3/output.tsv.biom', '/tmp/tmp7lorj9n3/track.tsv', '/tmp/tmp7lorj9n3/filt_f', '/tmp/tmp7lorj9n3/filt_r', '0', '0', '0', '0', '2.0', '2.0', '2', '12', 'independent', 'consensus', '1.0', '6', '1000000']' returned non-zero exit status 1. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/jdc/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/q2cli/commands.py", line 329, in __call__ results = action(**arguments) File "<decorator-gen-514>", line 2, in denoise_paired File "/home/jdc/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/qiime2/sdk/action.py", line 244, in bound_callable outputs = self._callable_executor_(scope, callable_args, File "/home/jdc/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/qiime2/sdk/action.py", line 390, in _callable_executor_ output_views = self._callable(**view_args) File "/home/jdc/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/q2_dada2/_denoise.py", line 279, in denoise_paired raise Exception("An error was encountered while running DADA2" Exception: An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more. Plugin error from dada2: An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more. See above for debug info.
I suspect DADA2 cannot handle this kind of "flaty quality" datasets. Any suggestions on how to proceed with the analysis would be helpful.
Perhaps it is convenient to use deblur in this dataset? I always used DADA2 in my previous analyses (data sequenced in MiSeq).
Are the ASVs obtained with DADA2 and deblur comparable? I would like to be able to do a longitudinal analysis of the microbiota by combining the previous Miseq data with this Novaseq dataset.
Attached find the demux.qzv visualization
i04-demux.qzv (307.0 KB)
Thank you very much in advance