DADA2 Denoise Single vs Paired Requirements

Hello!

I recently completed analysis on a large sampleset with QIIME2. I had received guidance on this run in a separate forum entry: DADA2 Filtering Error - Failed to Write Record - #21 by megan.justice

Unfortunately, I realized I utilized the dada2 denoise-SINGLE instead of the dada2 denoise-PAIRED that I should have utilized.

The initial code was:

#qiime dada2 denoise-single\
#	--p-trim-left 0\
#	--p-trunc-len 268\
#	--i-demultiplexed-seqs paired-end-demux.qza\
#	--o-representative-sequences rep-seqs-1.qza\
#	--o-table table-1.qza\
#	--o-denoising-stats stats-1.qza\

To remedy this, I copied the paired-end-demux.qza object from the successful 'single' run to a new directory and edited my dada2 denoise command to the following:

## Denoise
qiime dada2 denoise-paired\
	--p-trunc-len-f 268\
	--p-trunc-len-r 268\
	--i-demultiplexed-seqs paired-end-demux.qza\
	--o-representative-sequences rep-seqs-1.qza\
	--o-table table-1.qza\
    --verbose\
	--o-denoising-stats stats-1.qza\

When I try this command on the same data as the single run, I get the following message:

Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada.R --input_directory /tmp/tmp39j03iev/forward --input_directory_reverse /tmp/tmp39j03iev/reverse --output_path /tmp/tmp39j03iev/output.tsv.biom --output_track /tmp/tmp39j03iev/track.tsv --filtered_directory /tmp/tmp39j03iev/filt_f --filtered_directory_reverse /tmp/tmp39j03iev/filt_r --truncation_length 268 --truncation_length_reverse 268 --trim_left 0 --trim_left_reverse 0 --max_expected_errors 2.0 --max_expected_errors_reverse 2.0 --truncation_quality_score 2 --min_overlap 12 --pooling_method independent --chimera_method consensus --min_parental_fold 1.0 --allow_one_off False --num_threads 10 --learn_min_reads 1000000

R version 4.3.3 (2024-02-29)
Loading required package: Rcpp
DADA2: 1.30.0 / Rcpp: 1.0.13.1 / RcppParallel: 5.1.9
2) Filtering ........................................................................................................................................................................................
3) Learning Error Rates
385065348 total bases in 1436811 reads from 4 samples will be used for learning the error rates.
385065348 total bases in 1436811 reads from 4 samples will be used for learning the error rates.
Error rates could not be estimated (this is usually because of very few reads).
Error in getErrors(err, enforce = TRUE) : Error matrix is NULL.
6: stop("Error matrix is NULL.")
5: getErrors(err, enforce = TRUE)
4: dada(drps, err = NULL, errorEstimationFunction = errorEstimationFunction,
selfConsist = TRUE, multithread = multithread, verbose = verbose,
MAX_CONSIST = MAX_CONSIST, OMEGA_C = OMEGA_C, ...)
3: learnErrors(filtsR, nreads = nreads.learn, multithread = multithread)
2: withCallingHandlers(expr, warning = function(w) if (inherits(w,
classes)) tryInvokeRestart("muffleWarning"))
1: suppressWarnings(learnErrors(filtsR, nreads = nreads.learn, multithread = multithread))
Traceback (most recent call last):
File "/home/ec2-user/miniconda3/envs/qiime2-amplicon-2024.10/lib/python3.10/site-packages/q2_dada2/_denoise.py", line 353, in denoise_paired
run_commands([cmd])
File "/home/ec2-user/miniconda3/envs/qiime2-amplicon-2024.10/lib/python3.10/site-packages/q2_dada2/_denoise.py", line 38, in run_commands
subprocess.run(cmd, check=True)
File "/home/ec2-user/miniconda3/envs/qiime2-amplicon-2024.10/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['run_dada.R', '--input_directory', '/tmp/tmp39j03iev/forward', '--input_directory_reverse', '/tmp/tmp39j03iev/reverse', '--output_path', '/tmp/tmp39j03iev/output.tsv.biom', '--output_track', '/tmp/tmp39j03iev/track.tsv', '--filtered_directory', '/tmp/tmp39j03iev/filt_f', '--filtered_directory_reverse', '/tmp/tmp39j03iev/filt_r', '--truncation_length', '268', '--truncation_length_reverse', '268', '--trim_left', '0', '--trim_left_reverse', '0', '--max_expected_errors', '2.0', '--max_expected_errors_reverse', '2.0', '--truncation_quality_score', '2', '--min_overlap', '12', '--pooling_method', 'independent', '--chimera_method', 'consensus', '--min_parental_fold', '1.0', '--allow_one_off', 'False', '--num_threads', '10', '--learn_min_reads', '1000000']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/ec2-user/miniconda3/envs/qiime2-amplicon-2024.10/lib/python3.10/site-packages/q2cli/commands.py", line 530, in call
results = self._execute_action(
File "/home/ec2-user/miniconda3/envs/qiime2-amplicon-2024.10/lib/python3.10/site-packages/q2cli/commands.py", line 602, in _execute_action
results = action(**arguments)
File "", line 2, in denoise_paired
File "/home/ec2-user/miniconda3/envs/qiime2-amplicon-2024.10/lib/python3.10/site-packages/qiime2/sdk/action.py", line 299, in bound_callable
outputs = self.callable_executor(
File "/home/ec2-user/miniconda3/envs/qiime2-amplicon-2024.10/lib/python3.10/site-packages/qiime2/sdk/action.py", line 570, in callable_executor
output_views = self._callable(**view_args)
File "/home/ec2-user/miniconda3/envs/qiime2-amplicon-2024.10/lib/python3.10/site-packages/q2_dada2/_denoise.py", line 366, in denoise_paired
raise Exception("An error was encountered while running DADA2"
Exception: An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

Plugin error from dada2:

An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

See above for debug info.

No output files were generated, so this is the only guidance I have regarding what went wrong.

I am working in an AWS EC2 instance with 60 GB RAM and 2 TB disk space.

I have since tried to multithread the command into 10 threads, but it still failed.

What could be happening here?
Does the denosie-paired require significantly more RAM than denoise-single?
Any suggestions on a remedy?

Thanks!

Hello @megan.justice,

If you feel comfortable doing so could you attach the demux visualization for your demux artifact? It's possible that your truncation parameters are causing too many reads to be filtered.

Increasing the number of threads used will increase memory usage, not decrease memory usage. That said, there's no indication that I see that this is a memory related issue.

Absolutely! I'll attach the demux summary here.

It may be worth noting that I am using the same paired-end-demux.qza object for this run as my previous successful run (in which I used single erroneously).

That's why I assumed that the paried-end-demux.qza was not the issue.
demux-summary (4).qzv (317.5 KB)

Hello @megan.justice,

I'm not sure what the problem is. Sometimes reads with quality scores such as yours, which are called binned quality scores and output by some newer illumina platforms, can cause issues with dada2 because there is not enough variance in the scores. However your forward reads worked fine and show less quality variance than the reverse reads. The truncation lengths also look fine. I would suggest opening an issue on dada2's github repo, the developer is responsive to questions and will be able to give you more detailed insight.