An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more

kkcool · November 21, 2017, 2:08pm

Hello,
I have been getting the following error after the denoising step:

An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

I gave the following command

qiime dada2 denoise-paired \
>   --i-demultiplexed-seqs paired-end-demux.qza \
>   --o-table table \
>   --o-representative-sequences rep-seqs \
>   --p-trim-left-f 13 \
>   --p-trim-left-r 13 \
>   --p-trunc-len-f 150 \
>   --p-trunc-len-r 150

I was not encountering this problem before.This came suddenly. It would be great if anyone could guide me.
The qiime 2 version that am using is qiime2 2017.10 and my R version is also 3.3.

thermokarst · November 21, 2017, 2:10pm

Hi @kkcool! In order for us to assist you, we will need the detailed error --- can you please rerun with --verbose and copy-and-paste those results here, or, attach the error log file listed at the end of the error message? Thanks!

kkcool · November 22, 2017, 12:33pm

Plugin error from dada2:

  An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

Debug info has been saved to /tmp/qiime2-q2cli-err-246fgnfr.log
The following is in my log file-
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_paired.R /tmp/tmp0lee7zg3/forward /tmp/tmp0lee7zg3/reverse /tmp/tmp0lee7zg3/output.tsv.biom /tmp/tmp0lee7zg3/filt_f /tmp/tmp0lee7zg3/filt_r 150 150 13 13 2.0 2 consensus 1.0 1 1000000

R version 3.3.2 (2016-10-31) 
Loading required package: Rcpp
There were 50 or more warnings (use warnings() to see the first 50)
DADA2 R package version: 1.4.0 
1) Filtering ....
2) Learning Error Rates
2a) Forward Reads
Initializing error rates to maximum possible estimate.
Sample 1 - 47247 reads in 7216 unique sequences.
Sample 2 - 64347 reads in 8628 unique sequences.
Sample 3 - 5292 reads in 1591 unique sequences.
Sample 4 - 48223 reads in 5545 unique sequences.
   selfConsist step 2 
   selfConsist step 3 
   selfConsist step 4 
   selfConsist step 5 
   selfConsist step 6 


Convergence after  6  rounds.
2b) Reverse Reads
Initializing error rates to maximum possible estimate.
Sample 1 - 47247 reads in 23249 unique sequences.
Sample 2 - 64347 reads in 28165 unique sequences.
Sample 3 - 5292 reads in 3718 unique sequences.
Sample 4 - 48223 reads in 21266 unique sequences.
   selfConsist step 2 
   selfConsist step 3 
   selfConsist step 4 


Convergence after  4  rounds.

3) Denoise remaining samples 
4) Remove chimeras (method = consensus)
Error in isBimeraDenovoTable(unqs[[i]], ..., verbose = verbose) : 
  Input must be a valid sequence table.
Calls: removeBimeraDenovo -> isBimeraDenovoTable
In addition: Warning message:
In is.na(colnames(unqs[[i]])) :
  is.na() applied to non-(list or vector) of type 'NULL'
Execution halted
Traceback (most recent call last):
  File "/home/mim/miniconda3/envs/qiime2-2017.10/lib/python3.5/site-packages/q2_dada2/_denoise.py", line 179, in denoise_paired
    run_commands([cmd])
  File "/home/mim/miniconda3/envs/qiime2-2017.10/lib/python3.5/site-packages/q2_dada2/_denoise.py", line 35, in run_commands
    subprocess.run(cmd, check=True)
  File "/home/mim/miniconda3/envs/qiime2-2017.10/lib/python3.5/subprocess.py", line 398, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['run_dada_paired.R', '/tmp/tmp0lee7zg3/forward', '/tmp/tmp0lee7zg3/reverse', '/tmp/tmp0lee7zg3/output.tsv.biom', '/tmp/tmp0lee7zg3/filt_f', '/tmp/tmp0lee7zg3/filt_r', '150', '150', '13', '13', '2.0', '2', 'consensus', '1.0', '1', '1000000']' returned non-zero exit status 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mim/miniconda3/envs/qiime2-2017.10/lib/python3.5/site-packages/q2cli/commands.py", line 218, in __call__
    results = action(**arguments)
  File "<decorator-gen-338>", line 2, in denoise_paired
  File "/home/mim/miniconda3/envs/qiime2-2017.10/lib/python3.5/site-packages/qiime2/sdk/action.py", line 220, in bound_callable
    output_types, provenance)
  File "/home/mim/miniconda3/envs/qiime2-2017.10/lib/python3.5/site-packages/qiime2/sdk/action.py", line 355, in _callable_executor_
    output_views = self._callable(**view_args)
  File "/home/mim/miniconda3/envs/qiime2-2017.10/lib/python3.5/site-packages/q2_dada2/_denoise.py", line 194, in denoise_paired
    " and stderr to learn more." % e.returncode)
Exception: An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

Summary

Kindly guide me.

thermokarst · November 22, 2017, 3:00pm

Hi @kkcool! It looks like your forward and reverse reads might be identical - please see this thread, which was a pretty similar situation. Can you provide some more details about your source data:

How did you import it? Copy-and-paste your import command, please.
How is the data formatted before import? Is it demultiplexed? Can you screenshot the folder with your sequences please.
Have the primers been removed, etc?

Thanks!

Cheng50373640 · November 24, 2017, 9:30pm

hi @thermokarst, actually i have the same error, here you go:
importing data¶
qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path tim2-modify-manifest
--output-path demux.qza
--source-format PairedEndFastqManifestPhred33

qiime demux summarize
--i-data demux.qza
--o-visualization demux.qzv

i don't know whether the primers is removed.......

thank you!

thermokarst · November 27, 2017, 10:16pm

Hi @Cheng50373640, can you please post the following:

The release of QIIME 2 you are using.
The exact error message, either upload the log file, or rerun with --verbose and copy-and-paste the full error.

Also, you will need to determine if your primers are removed or not --- QIIME 2 is not able to determine that on its own.

Thanks!

hhftang · December 7, 2017, 6:29pm

Hi, I've been having virtually the same problem. Attached is the qiime2 log file which shows the same error message as above:

Error in isBimeraDenovoTable(unqs[[i]], ..., verbose = verbose) : 
  Input must be a valid sequence table.

Curiously, it only happens to one of my runs. At a superficial glance, there doesn't appear to be any major differences between the input .qza for this run, and .qza for the other runs. The input .qza was of data type SampleData[PairedEndSequencesWithQuality]; and data format SingleLanePerSamplePairedEndFastqDirFmt. See attached for .qzv visualisations of each of these runs.

qiime2-q2cli-err-q3sug9zw.txt (4.5 KB)

Problematic run:
140212_M00267_0101_000000000-A8DL5-demux-paired-end.qzv (287.0 KB)

Good run example:
131017_M00267_0071_000000000-A5GJA-demux-paired-end.qzv (284.7 KB)

Script used to run analysis:

qiime dada2 denoise-paired \
  --i-demultiplexed-seqs $data/$line-demux-paired-end.qza \
  --o-table $data/$line-dada-table \
  --o-representative-sequences $data/$line-rep-seqs \
  --p-trim-left-f 10 \
  --p-trim-left-r 10 \
  --p-trunc-len-f 150 \
  --p-trunc-len-r 150

qiime feature-table summarize \
  --i-table $data/$line-dada-table.qza \
  --o-visualization $data/$line-dada-table.qzv \
  --m-sample-metadata-file $data/$line.tsv

qiime feature-table tabulate-seqs \
  --i-data $data/$line-rep-seqs.qza \
  --o-visualization $data/$line-rep-seqs.qzv

Looking forward to hearing from you soon - H

thermokarst · December 8, 2017, 11:24pm

Hi @hhftang, thanks for all the detailed info!

Have you seen @benjjneb's comment here, regarding a similar error? Looking at your error log, I am seeing the same thing - your read counts are identical forward and reverse.

hhftang · December 11, 2017, 5:25pm

Hi @thermokarst, thanks for your reply.

My understanding is that there should be a reverse read for each forward read, so shouldn't forward and reverse reads have the same number anyway? (though number of unique sequences may be different). I'm still starting out in this research space, so apologies if I'm mistaken.

If I view the fastq files, the forward and reverse reads do not appear identical.

That said, I have located some clues as to why there is an error. A similar problem was reported previously on the qiime2 forums, and the cause appeared to be related to issues with merging/joining forward and reverse reads (Qiime dada2 denoise-paired end sequences - #9 by benjjneb). When I attempted to join the reads in the problematic run using another utility (fastq-join from ea-utils), the "success rate" was much lower than what I see in other runs (<0.1%). So it may be a problem with the actual data, although I'm still at a loss as to what exactly is wrong with it. As far as I could tell, primer sequences were removed prior to analysis.

I will continue looking into this

H

thermokarst · December 12, 2017, 3:03am

Hi @hhftang!

This should be the case for the multiplexed data, hot off the instrument, but is likely not the case for your demultiplexed data (this is the process of separating out your reads into individual samples). During demultiplexing, it is possible for reads to be discarded for a handful of reasons (and they vary from tool to tool): sequencing error (your barcodes might not be identifiable as the original barcode sequence in the read), read quality (some demultiplexing tools will filter low quality reads for you, that way you don't have to "waste" computational energy performing QA/QC steps like DADA2 or deblur on reads that you have already identified as being below some threshold of quality), and probably all kinds of other things I am not considering right now.

If you are able to share these data, I am happy to take a look.

EDIT

I was mistaken here (sorry @hhftang) --- based on the log, it looks like I was barking up the wrong tree. I will follow up in a separate reply.

wangj50 · December 14, 2017, 10:42pm

Hi @thermokarst @benjjneb and all, I came across this thread because I have exactly the same issue. First of all, my data do not have identical forward and reverse reads. So the issue may not be directly due to that. Here is my output:

qiime dada2 denoise-paired --i-demultiplexed-seqs Amplicon_import.qza --p-trunc-len-f 0 --p-trunc-len-r 25 --p-trim-left-f 5 --p-trim-left-r 5 --p-n-threads 0 --output-dir dada2_out --verbose
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_paired.R /var/folders/yt/7p79svg5287g19gfd60ph7ddv8yrx7/T/tmpyqw8dbj_/forward /var/folders/yt/7p79svg5287g19gfd60ph7ddv8yrx7/T/tmpyqw8dbj_/reverse /var/folders/yt/7p79svg5287g19gfd60ph7ddv8yrx7/T/tmpyqw8dbj_/output.tsv.biom /var/folders/yt/7p79svg5287g19gfd60ph7ddv8yrx7/T/tmpyqw8dbj_/filt_f /var/folders/yt/7p79svg5287g19gfd60ph7ddv8yrx7/T/tmpyqw8dbj_/filt_r 0 25 5 5 2.0 2 consensus 1.0 0 1000000

R version 3.3.2 (2016-10-31)
Loading required package: Rcpp
There were 50 or more warnings (use warnings() to see the first 50)
DADA2 R package version: 1.4.0

Filtering ...............

Learning Error Rates
Not all sequences were the same length.
Not all sequences were the same length.
Not all sequences were the same length.
Not all sequences were the same length.
Not all sequences were the same length.
Not all sequences were the same length.
Not all sequences were the same length.
Not all sequences were the same length.
Not all sequences were the same length.
Not all sequences were the same length.
Not all sequences were the same length.
Not all sequences were the same length.
Not all sequences were the same length.
Not all sequences were the same length.
Not all sequences were the same length.
2a) Forward Reads
Initializing error rates to maximum possible estimate.
Sample 1 - 84485 reads in 19644 unique sequences.
Sample 2 - 20861 reads in 5755 unique sequences.
Sample 3 - 23795 reads in 6522 unique sequences.
Sample 4 - 21276 reads in 6179 unique sequences.
Sample 5 - 19828 reads in 5306 unique sequences.
Sample 6 - 101160 reads in 23174 unique sequences.
Sample 7 - 84890 reads in 17739 unique sequences.
Sample 8 - 75500 reads in 17985 unique sequences.
Sample 9 - 84367 reads in 29759 unique sequences.
Sample 10 - 87923 reads in 22841 unique sequences.
Sample 11 - 43356 reads in 15911 unique sequences.
Sample 12 - 47705 reads in 14234 unique sequences.
Sample 13 - 57324 reads in 16407 unique sequences.
Sample 14 - 35573 reads in 11830 unique sequences.
Sample 15 - 58332 reads in 17638 unique sequences.
selfConsist step 2
selfConsist step 3
selfConsist step 4
selfConsist step 5
selfConsist step 6
selfConsist step 7

Convergence after 7 rounds.
2b) Reverse Reads
Initializing error rates to maximum possible estimate.
Sample 1 - 84485 reads in 1506 unique sequences.
Sample 2 - 20861 reads in 639 unique sequences.
Sample 3 - 23795 reads in 563 unique sequences.
Sample 4 - 21276 reads in 452 unique sequences.
Sample 5 - 19828 reads in 482 unique sequences.
Sample 6 - 101160 reads in 1589 unique sequences.
Sample 7 - 84890 reads in 1596 unique sequences.
Sample 8 - 75500 reads in 1245 unique sequences.
Sample 9 - 84367 reads in 5664 unique sequences.
Sample 10 - 87923 reads in 5745 unique sequences.
Sample 11 - 43356 reads in 1204 unique sequences.
Sample 12 - 47705 reads in 1255 unique sequences.
Sample 13 - 57324 reads in 1515 unique sequences.
Sample 14 - 35573 reads in 1029 unique sequences.
Sample 15 - 58332 reads in 1427 unique sequences.
selfConsist step 2
selfConsist step 3
selfConsist step 4
selfConsist step 5

Convergence after 5 rounds.

Denoise remaining samples

Remove chimeras (method = consensus)
Error in isBimeraDenovoTable(unqs[[i]], ..., verbose = verbose) :
Input must be a valid sequence table.
Calls: removeBimeraDenovo -> isBimeraDenovoTable
In addition: Warning message:
In is.na(colnames(unqs[[i]])) :
is.na() applied to non-(list or vector) of type 'NULL'
Execution halted
Traceback (most recent call last):
File "/Users/wangj50/miniconda2/envs/qiime2-2017.10/lib/python3.5/site-packages/q2_dada2/denoise.py", line 179, in denoise_paired
run_commands([cmd])
File "/Users/wangj50/miniconda2/envs/qiime2-2017.10/lib/python3.5/site-packages/q2_dada2/denoise.py", line 35, in run_commands
subprocess.run(cmd, check=True)
File "/Users/wangj50/miniconda2/envs/qiime2-2017.10/lib/python3.5/subprocess.py", line 398, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['run_dada_paired.R', '/var/folders/yt/7p79svg5287g19gfd60ph7ddv8yrx7/T/tmpyqw8dbj/forward', '/var/folders/yt/7p79svg5287g19gfd60ph7ddv8yrx7/T/tmpyqw8dbj/reverse', '/var/folders/yt/7p79svg5287g19gfd60ph7ddv8yrx7/T/tmpyqw8dbj_/output.tsv.biom', '/var/folders/yt/7p79svg5287g19gfd60ph7ddv8yrx7/T/tmpyqw8dbj_/filt_f', '/var/folders/yt/7p79svg5287g19gfd60ph7ddv8yrx7/T/tmpyqw8dbj_/filt_r', '0', '25', '5', '5', '2.0', '2', 'consensus', '1.0', '0', '1000000']' returned non-zero exit status 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/wangj50/miniconda2/envs/qiime2-2017.10/lib/python3.5/site-packages/q2cli/commands.py", line 218, in call
results = action(**arguments)
File "", line 2, in denoise_paired
File "/Users/wangj50/miniconda2/envs/qiime2-2017.10/lib/python3.5/site-packages/qiime2/sdk/action.py", line 220, in bound_callable
output_types, provenance)
File "/Users/wangj50/miniconda2/envs/qiime2-2017.10/lib/python3.5/site-packages/qiime2/sdk/action.py", line 355, in callable_executor
output_views = self._callable(**view_args)
File "/Users/wangj50/miniconda2/envs/qiime2-2017.10/lib/python3.5/site-packages/q2_dada2/_denoise.py", line 194, in denoise_paired
" and stderr to learn more." % e.returncode)
Exception: An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

In addition, regarding the following quote:

From my experiences, in all cases I have encountered, the demultiplexed reads have the same count in forward and reverse reads, although it is probably due to the tool that is used by Illumina/Sequencing center technicians. I don't disagree with what you are trying to say, but I think it may be misleading by saying "is likely not the case for demultiplexed data)".

Thank you!

thermokarst · December 14, 2017, 11:09pm

Thanks @wangj50!

I agree! I updated that to clarify, "is likely not the case for your demultiplexed data".

With that said, looking back at that log, I made a mistake and must've misread something - I thought that there were the same number of unique sequences in each sample, fwd and reverse (that would likely be an issue, because then everything truly would be identical between fwd and rev). Given that, probably none of what I wrote above is valid, realizing I made a mistake!

Thanks! That is an important distinction (and one I should've made above) - I think most tools should handle making sure read-pairs are binned or dropped together (as a pair), preventing a mismatch from happening, but it really depends on the tool (I suspect there are some homegrown scripts floating around online that are responsible for creating these mismatched data!).

Going back to @hhftang's issue, and your similar issue you just reported, it sounds like this error message (Error in isBimeraDenovoTable(unqs[[i]], …, verbose = verbose) : Input must be a valid sequence table.) is caused when none of the sequences are actually merged. This might be happening because --p-trunc-len-r is set really low here (25). As noted in the docs, there needs to be at least a 20nt overlap between fwd and rev reads. Maybe take another look at your demux summarize viz and make sure that your trunc-len-r value is right. Keep us posted, and thanks for setting me straight!

thermokarst · December 14, 2017, 11:18pm

@hhftang, I think your joining hypothesis is spot on here. Looking back at your "problematic run", do you get better results if you truncate any of those low quality positions off of your reverse reads (e.g. --p-trunc-len-r 140)? Thanks for bearing with me here!

wangj50 · December 14, 2017, 11:26pm

AH, sorry. I made a mistake. You are right, that should be it. I meant 275 instead 25 there. I'll rerun to see if everything is right.
Thank you for your time.

hhftang · January 4, 2018, 4:06pm

@thermokarst, I've tried your recommendation --p-trunc-len-r 140 and it appears to have fixed it. When I look at the individual fastq files for the problematic run, it does seem that the last 10 bases of the reverse strand are of poor quality. Things tend to align after removing those 10, at least from visual inspection (with about 40nt overlap). So in my case the original --p-trunc-len-r 150 (i.e. almost no truncation) was set too high. It's interesting though that it had previously worked for all other runs.

I agree that the ultimate source of the error is probably failure of merging / joining of the paired-end reads, which means that the next step of the dada2 pipeline has nothing to work with.

I'll let you know if any new problems emerge.