Error in DADA denoising

anndonxlow · July 29, 2024, 5:38am

Hello. I am getting the following error when I run QIIME2 dada2 denoise-paired on an Azure virtual machine.

qiime dada2 denoise-paired \
    --i-demultiplexed-seqs /home/azureuser/DataAnalysis/demux.qz.qza \
    --p-trunc-len-f 250 \
    --p-trunc-len-r 190 \
    --o-table /home/azureuser/DataAnalysis/table.qza \
    --o-representative-sequences /home/azureuser/DataAnalysis/rep-seqs.qza \
    --o-denoising-stats /home/azureuser/DataAnalysis/denoising-stats.qza

Here is the error message:
Plugin error from dada2:

An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

Debug info has been saved to /tmp/qiime2-q2cli-err-kd_x0n7f.log.

The error log is here:
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no>
Command: run_dada.R --input_directory /tmp/tmpd5rl3hg6/forward --input_directory_reverse /tmp/tmpd5rl3hg6/reverse --output_>
R version 4.3.3 (2024-02-29)
Loading required package: Rcpp
DADA2: 1.30.0 / Rcpp: 1.0.12 / RcppParallel: 5.1.6
2) Filtering .
3) Learning Error Rates
78516500 total bases in 314066 reads from 1 samples will be used for learning the error rates.
59672540 total bases in 314066 reads from 1 samples will be used for learning the error rates.
3) Denoise samples .
.
5) Remove chimeras (method = consensus)
Error in isBimeraDenovoTable(unqs[[i]], ..., verbose = verbose) :
Input must be a valid sequence table.
Calls: removeBimeraDenovo -> isBimeraDenovoTable
3: stop("Input must be a valid sequence table.")
2: isBimeraDenovoTable(unqs[[i]], ..., verbose = verbose)
1: removeBimeraDenovo(seqtab, method = chimeraMethod, minFoldParentOverAbundance = minParentFold,
allowOneOff = allowOneOff, multithread = multithread)
Traceback (most recent call last):
Running external command line application(s). This may print messages to stdout and/or st>The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no> Command: run_dada.R --input_directory /tmp/tmpd5rl3hg6/forward --input_directory_reverse /tmp/tmpd5rl3hg6/reverse --output_> R version 4.3.3 (2024-02-29) Loading required package: Rcpp
DADA2: 1.30.0 / Rcpp: 1.0.12 / RcppParallel: 5.1.6
2) Filtering .
3) Learning Error Rates
78516500 total bases in 314066 reads from 1 samples will be used for learning the error rates.
59672540 total bases in 314066 reads from 1 samples will be used for learning the error rates.
3) Denoise samples .
.
5) Remove chimeras (method = consensus)Error in isBimeraDenovoTable(unqs[[i]], ..., verbose = verbose) :
Input must be a valid sequence table.
Calls: removeBimeraDenovo -> isBimeraDenovoTable
3: stop("Input must be a valid sequence table.")
2: isBimeraDenovoTable(unqs[[i]], ..., verbose = verbose)
1: removeBimeraDenovo(seqtab, method = chimeraMethod, minFoldParentOverAbundance = minParentFold,
allowOneOff = allowOneOff, multithread = multithread)
Traceback (most recent call last):
File "/home/azureuser/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2_dada2/_denoise.py", line 350,> run_commands([cmd])
File "/home/azureuser/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2_dada2/denoise.py", line 37, > subprocess.run(cmd, check=True)
File "/home/azureuser/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['run_dada.R', '--input_directory', '/tmp/tmpd5rl3hg6/forward', '--input_directory>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/azureuser/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2cli/commands.py", line 520, in> results = self._execute_action(
File "/home/azureuser/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2cli/commands.py", line 581, in> results = action(**arguments)
File "", line 2, in denoise_paired
File "/home/azureuser/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/qiime2/sdk/action.py", line 342,> outputs = self.callable_executor(
File "/home/azureuser/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/qiime2/sdk/action.py", line 576,> output_views = self._callable(**view_args)
File "/home/azureuser/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2_dada2/_denoise.py", line 363,> raise Exception("An error was encountered while running DADA2"
Exception: An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

Mike_Stevenson · July 29, 2024, 6:16am

Hi @anndonxlow

I think the problem is hinted at in this error.

The input file must have a .qza extension and I think it is reading it as .qz.qza.

Simply change this and you should be good to go. Here’s the full command:

DADA2 Input

    --i-demultiplexed-seqs /home/azureuser/DataAnalysis/demux.qza \
    --p-trunc-len-f 250 \
    --p-trunc-len-r 190 \
    --o-table /home/azureuser/DataAnalysis/table.qza \
    --o-representative-sequences /home/azureuser/DataAnalysis/rep-seqs.qza \
    --o-denoising-stats /home/azureuser/DataAnalysis/denoising-stats.qza

-Mike

salias · July 29, 2024, 10:50am

Hi @anndonxlow and @Mike_Stevenson ,

I don't think the file extension is the problem here. demux.qz.qza has .qza extension, regardless of whether there are more points (.) before the final one.¹ Moreover, the error message shows that DADA2 is indeed working until the chimera removal step.

We addressed this error recently in the forum. In my first reply of this post I put some links to similar questions, and I outline two possible reasons for this error to happen. Long story short, it is quite likely that either:

Your paired-end reads are not merging due to too aggressive filtering parameters.
You are using the same reads as forward and reverse.

Best,

Sergio

--

¹ If you are curious about this, you can read some on why file extensions are not that important here.

salias · July 30, 2024, 8:21am

Hello @anndonxlow ,

I'm happy you got it working!

As you request, I'm going to try to explain why DADA2 works with those parameters.

DADA2 is, in fact, a whole workflow that allows you to obtain ASVs. It has several steps: filtering, dereplication, chimera identification and merging of paired end reads. Here we will focus on filtering.

Inside filtering, there are 2 steps we are interested in:

Trim sequences to a specified length. Here is where --p-trunc-len-f and --p-trunc-len-r parameters are used.
Filter based on the number of ambiguous bases, a minimum quality score, and the expected errors in a read. Here is where --p-max-ee-f, --p-max-ee-r and --p-trunc-q parameters are used. Those are optional parameters that already have a default value (2) that works fine most of the cases.

In your first attemp to make DADA2 work you used --p-trunc-len-f 250 and --p-trunc-len-r 190. That means you are truncating forward reads at position 250 and reverse reads at position 190. The problem here appears to be that since you are truncating at a position almost near the sequences end, you are keeping a lot of the final part of the sequences, which seem to have very low quality. Therefore, the second step of the filtering I outlined above is discarding them, leaving very few (or no) sequences for the next steps of DADA2.

When you used --p-trunc-len-f 100 and --p-trunc-len-r 100 you are ruling out those final portions of sequences with really low quality, so the quality filtering performed by DADA2 is not "seeing" those and therefore it keeps your sequences for the rest of the DADA2 pipeline.

I'm not 100% sure this explanation is entirely correct, so if any senior member has something to correct / clarify it would be great!

Cheers,

Sergio

anndonxlow · July 30, 2024, 2:52pm

Hi @salias

Thank you for your help. I have tried to change both p-trunc-len-f and p-trunc-len-r to 300 and the code works. However, I do not understand the principle why does the code not work with values that is between 120 to 280. In fact, setting my p-trunc-len-f and r to 250 and 190 respectively is the most ideal as I would be able to get ride of the reads with low quality scores. Would you be able to explain it? Thanks.

salias · July 30, 2024, 6:01pm

Oh, I see. In my previous post, I was trying to build a possible explanation for the scenario of your deleted post. But if the situation is as you described now, the most likely explanation is that truncating reads at 250/190 is too much for DADA2 to merge forward and reverse reads (it is not possible to make them overlap so they are discarded). Using less aggressive parameters allows DADA2 to merge pairs, so you don't end up with an empty table and the error that was raised before is not raised anymore.

system · August 31, 2024, 12:02am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.