An error was encountered while running DADA2 in R (return code 1), denoised-single

wei_wei · May 28, 2021, 8:00pm

Hi,

I'm running dada2 denoised-single on my simulated reads and received this error.
This is what I ran:
qiime dada2 denoise-single --i-demultiplexed-seqs demux.qza --p-trunc-len 0 --output-dir qiimeoutput --verbose

Here is the error message:

Command: run_dada_single.R /tmp/qiime2-archive-45am8816/aa5c4cd2-2cd0-4fff-ba1e-eceaa3bcf57c/data /tmp/tmpavmzk2yk/output.tsv.biom /tmp/tmpavmzk2yk/track.tsv /tmp/tmpavmzk2yk 0 0 2.0 2 Inf independent consensus 1.0 1 1000000 NULL 16

R version 4.0.2 (2020-06-22)
Loading required package: Rcpp
DADA2: 1.18.0 / Rcpp: 1.0.5 / RcppParallel: 5.0.2

Filtering ..................................................

Learning Error Rates
12554918 total bases in 63847 reads from 50 samples will be used for learning the error rates.
Error rates could not be estimated (this is usually because of very few reads).
Error in getErrors(err, enforce = TRUE) : Error matrix is NULL.
Execution halted
Traceback (most recent call last):
File "/data/wjw5274/anaconda3/envs/qiime2/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 181, in _denoise_single
run_commands([cmd])
File "/data/wjw5274/anaconda3/envs/qiime2/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 36, in run_commands
subprocess.run(cmd, check=True)
File "/data/wjw5274/anaconda3/envs/qiime2/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['run_dada_single.R', '/tmp/qiime2-archive-45am8816/aa5c4cd2-2cd0-4fff-ba1e-eceaa3bcf57c/data', '/tmp/tmpavmzk2yk/output.tsv.biom', '/tmp/tmpavmzk2yk/track.tsv', '/tmp/tmpavmzk2yk', '0', '0', '2.0', '2', 'Inf', 'independent', 'consensus', '1.0', '1', '1000000', 'NULL', '16']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/data/wjw5274/anaconda3/envs/qiime2/lib/python3.6/site-packages/q2cli/commands.py", line 329, in call
results = action(**arguments)
File "", line 2, in denoise_single
File "/data/wjw5274/anaconda3/envs/qiime2/lib/python3.6/site-packages/qiime2/sdk/action.py", line 245, in bound_callable
output_types, provenance)
File "/data/wjw5274/anaconda3/envs/qiime2/lib/python3.6/site-packages/qiime2/sdk/action.py", line 390, in callable_executor
output_views = self._callable(**view_args)
File "/data/wjw5274/anaconda3/envs/qiime2/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 218, in denoise_single
band_size='16')
File "/data/wjw5274/anaconda3/envs/qiime2/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 192, in _denoise_single
" and stderr to learn more." % e.returncode)
Exception: An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

Plugin error from dada2:

An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

See above for debug info.

I looked at posts with similar issues but could not find a suitable solution. Most of the posts I found have issues with merging forward and reverse, which is not the case here. Also tried adjusting --p-trunc-len and --p-trim-left, to no avail.
My question is, is "Error rates could not be estimated (this is usually because of very few reads)." the key issue here? Should I perhaps rerun my read simulation to have longer read length (if so, what's a suitable read length)? Or is it something I can fix by adjusting dada2 options?
Thank you so much! Appreciate your help.

lizgehret · May 28, 2021, 9:16pm

Hi @wei_wei,

Thanks for reaching out! Happy to provide some guidance here.

You're exactly correct - that is the primary error you're running into. This is because you are attempting to run dada2 denoise-single on simulated reads. DADA2 is designed to correct for Illumina-sequenced amplicon errors (which shouldn't be present in your simulated reads).

If you haven't read the DADA2 paper by Callahan et al, I'd highly recommend it for some additional background/context on how to best utilize this plugin.

Hope this helps! Please feel free to reach back out with any further questions.

Cheers,
Liz

wei_wei · May 28, 2021, 9:48pm

Hi,

Thank you for your reply! I will look into the paper. However, I simulated reads with some errors (using grinder) and I remember that I was able to use dada2 on simulated reads before. Are simulated errors inadequate?

Wei Wei

ChrisKeefe · June 7, 2021, 4:10pm

We haven’t forgotten about you, @wei_wei. Thanks for your patience!

lizgehret · June 7, 2021, 6:08pm

Hi @wei_wei, thanks for your patience here! Looping in Ben Callahan (the author of DADA2) on this.

@benjjneb, can you confirm whether dada2 can be used on simulated reads if they contain simulated errors - or is this no longer supported?

benjjneb · June 7, 2021, 6:37pm

Simulated reads/errors can be used with DADA2... if the simulated errors are reasonably close to what might come off a real sequencing instrument. In particular, there should be a range of quality scores and not just one quality score for correct bases, and another for incorrect bases. This post from the DADA2 issues tracker might help:

github.com/benjjneb/dada2

Error in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : span is too small

opened 06:54AM - 22 Dec 16 UTC

closed 05:58PM - 10 Jan 17 UTC

shrukane

Hi, Thanks for writing this package. I am getting the following error after de…replication **Command -** dadaFs.lrn <- dada(derepFs, err=NULL, selfConsist = TRUE, multithread=TRUE) **Error -** Initial error matrix unspecified. Error rates will be initialized to the maximum possible estimate from this data. Initializing error rates to maximum possible estimate. Error in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : span is too small In addition: Warning message: In dada(derepFs, err = NULL, selfConsist = TRUE, multithread = TRUE) : multithread is not a valid DADA option. The dataset I am using is simulated manually from varying the proportion of 5 distinctly related bacterial genera's. Is the error model not suited for sparsely related species ? Is there any other error model that I must use instead ? Thanks for looking into this. Shruti

There's no issue with the number of bacterial strains, however there may be an issue with the noise you are adding. The default error estimation function uses a loess fit, and basically assumes that errors are "normal" in the sense that they occur over a range of quality scores and in sufficient numbers for the loess smoothing to work.

I don't know how you are adding errors by hand, but if its pathological, let's say all the errors are of type T->C, or all the errors are quality score 22, the default error estimator will perform poorly.

That can be fixed by giving the algorithm the correct error model. Perhaps easier, and more to your goals, it might make more sense to just use a method that adds errors that look like those the sequencing machines make. See for example the ART program: ART: a next-generation sequencing read simulator - PMC

wei_wei · June 8, 2021, 6:43pm

Hi team,

Thank you! I really appreciate your help. I also discovered on my side that when I reduced the read length from 200 to 100, I was able to get dada2 to work. The exact limiting factor I wasn’t sure though. Maybe it is due to reduced error occurrences?
Unfortunately as far as I know grinder only allows two quality scores, one good one and one bad one. But I will look into ART. Thank you so much!

Wei Wei

system · July 10, 2021, 12:43am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.