DADA2 execution halted (Error in table: attempt to make a table with >= 2^31 elements)

potatoo · October 6, 2019, 2:32pm

I am using DADA2 (Qiime2 2019.7 which is installed within a conda environment in lab server) to denoise a adapter-trimmed 16S rRNA sequence dataset. However, execution halted after four days computation. I got an error message:

Plugin error from dada2: An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.
Debug info has been saved to /home/qfwfq/pgm/miniconda3/envs/qiime2-2019.7/tmp/qiime2-q2cli-err-hpcrjy3c.log

and I checked the debug info in temporary directory, it said (see entire output below):

Error in table(pairdf$forward, pairdf$reverse) :
attempt to make a table with >= 2^31 elements
Calls: mergePairs -> lapply -> FUN -> table
Execution halted
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

After searching for similar problems in qiime2 forum and other websites, I only found the following post: .
https://github.com/benjjneb/dada2/issues/641

However, this post does not provide any feasible solution.

I tried to find some clues in code of dada2. It seems that the error generated from the paired-end merging step, but I don't have any idea to overcome this problem. Would you please guide me to solve this problem?

Thank you for your attention and time.

potatoo

github.com

benjjneb/dada2/blob/master/R/paired.R

################################################################################
#' Merge denoised forward and reverse reads.
#' 
#' This function attempts to merge each denoised pair of forward and reverse reads, 
#' rejecting any pairs which do not sufficiently overlap or which contain too many 
#' (>0 by default) mismatches in the overlap region. Note: This function assumes that 
#' the fastq files for the forward and reverse reads were in the same order.
#' 
#' @param dadaF (Required). A \code{\link{dada-class}} object, or a list of such objects.
#'  The \code{\link{dada-class}} object(s) generated by denoising the forward reads.
#' 
#' @param derepF (Required). \code{character} or \code{\link{derep-class}}.
#'  The file path(s) to the fastq file(s), or a directory containing fastq file(s) corresponding to the
#'  the forward reads of the samples to be merged. Compressed file formats such as .fastq.gz and .fastq.bz2 are supported.
#'  A \code{\link{derep-class}} object (or list thereof) returned by \code{link{derepFastq}} can also be provided.
#'  These \code{\link{derep-class}} object(s) or fastq files should correspond to those used 
#'  as input to the the \code{\link{dada}} function when denoising the forward reads.
#'  
#' @param dadaR (Required). A \code{\link{dada-class}} object, or a list of such objects.
#'  The \code{\link{dada-class}} object(s) generated by denoising the reverse reads.

This file has been truncated. show original

computational environment

Ubuntu 18.04.3 LTS (lab server, RAM: 62GB, CPUs: 4, Cores per CPU: 4, Threads per core: 1)
Qiime2 2019.7
conda 4.7.12

dataset overview

qiime_trim_16s_HETrim_191002.qzv (338.0 KB)

command

export TMPDIR=/home/qfwfq/pgm/miniconda3/envs/qiime2-2019.7/tmp

indir=/home/qfwfq/DIMP/data/proc/20190927/qiime_trim_16s_dnmgb_190927/qiime_trim_16s_HETrim_190927.qza
outdir=/home/qfwfq/DIMP/data/proc/20191002/denoise_trim_16s_dnmgb_191002

nohup qiime dada2 denoise-paired
--i-demultiplexed-seqs ${indir}
--p-trim-left-f 0
--p-trim-left-r 0
--p-trunc-len-f 0
--p-trunc-len-r 0
--p-n-threads 4
--o-table ${outdir}/table_denoise_16s_dnmgb_191002.qza
--o-representative-sequences ${outdir}/rep_seqs_denoise_16s_dnmgb_191002.qza
--o-denoising-stats ${outdir}/stats_denoise_16s_dnmgb_191002.qza > ${outdir}/denoise_err_20191002.txt &

output message

Plugin error from dada2: An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.
Debug info has been saved to /home/qfwfq/pgm/miniconda3/envs/qiime2-2019.7/tmp/qiime2-q2cli-err-hpcrjy3c.log

qiime2 debug info in temporary directory (qiime2-q2cli-err-hpcrjy3c.log)

R version 3.5.1 (2018-07-02)
Loading required package: Rcpp
DADA2: 1.10.0 / Rcpp: 1.0.2 / RcppParallel: 4.4.3

Filtering .....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

Learning Error Rates
182577384 total bases in 1055371 reads from 15 samples will be used for learning the error rates.
180475410 total bases in 1055371 reads from 15 samples will be used for learning the error rates.

Denoise remaining samples .................................................................................Duplicate sequences in merged output.
.....................................................Duplicate sequences in merged output.
...................................Duplicate sequences in merged output.
..................................................Duplicate sequences in merged output.
.........Duplicate sequences in merged output.
.Duplicate sequences in merged output.
.....Duplicate sequences in merged output.
...Duplicate sequences in merged output.
.............Duplicate sequences in merged output.
...........Duplicate sequences in merged output.
........................................................Duplicate sequences in merged output.
.........................Duplicate sequences in merged output.
.Duplicate sequences in merged output.
..................Duplicate sequences in merged output.
.Duplicate sequences in merged output.
.Duplicate sequences in merged output.
.........Duplicate sequences in merged output.
................................................Duplicate sequences in merged output.
...............................................................................Duplicate sequences in merged output.
..................................................................................................................................................................................Duplicate sequences in merged output.
.................................................................................................................................................................................Duplicate sequences in merged output.
................Duplicate sequences in merged output.
...................................................Duplicate sequences in merged output.
.......................................Duplicate sequences in merged output.
...............Duplicate sequences in merged output.
..........................Duplicate sequences in merged output.
............................................................Duplicate sequences in merged output.
..........Duplicate sequences in merged output.
...................Duplicate sequences in merged output.
...............................Duplicate sequences in merged output.
...................Duplicate sequences in merged output.
.........Duplicate sequences in merged output.
....Duplicate sequences in merged output.
.Duplicate sequences in merged output.
........Duplicate sequences in merged output.
......Duplicate sequences in merged output.
.Duplicate sequences in merged output.
.....Duplicate sequences in merged output.
.Duplicate sequences in merged output.
.Duplicate sequences in merged output.
..Duplicate sequences in merged output.
.....Duplicate sequences in merged output.
..Duplicate sequences in merged output.
.........Duplicate sequences in merged output.
...................................................................Duplicate sequences in merged output.
..Duplicate sequences in merged output.
....Duplicate sequences in merged output.
.....Duplicate sequences in merged output.
.Duplicate sequences in merged output.
.....Duplicate sequences in merged output.
.................Duplicate sequences in merged output.
.Duplicate sequences in merged output.
..Duplicate sequences in merged output.
....Duplicate sequences in merged output.
.....Duplicate sequences in merged output.
..Duplicate sequences in merged output.
....Duplicate sequences in merged output.
.........Duplicate sequences in merged output.
..Duplicate sequences in merged output.
.............................................Duplicate sequences in merged output.
....Duplicate sequences in merged output.
....Duplicate sequences in merged output.
....Duplicate sequences in merged output.
.....................................................................................................................................Error in table(pairdf$forward, pairdf$reverse) :
attempt to make a table with >= 2^31 elements
Calls: mergePairs -> lapply -> FUN -> table
Execution halted
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_paired.R /home/qfwfq/pgm/miniconda3/envs/qiime2-2019.7/tmp/tmp3jlyy1il/forward /home/qfwfq/pgm/miniconda3/envs/qiime2-2019.7/tmp/tmp3jlyy1il/reverse /home/qfwfq/pgm/miniconda3/envs/qiime2-2019.7/tmp/tmp3jlyy1il/output.tsv.biom /home/qfwfq/pgm/miniconda3/envs/qiime2-2019.7/tmp/tmp3jlyy1il/track.tsv /home/qfwfq/pgm/miniconda3/envs/qiime2-2019.7/tmp/tmp3jlyy1il/filt_f /home/qfwfq/pgm/miniconda3/envs/qiime2-2019.7/tmp/tmp3jlyy1il/filt_r 0 0 0 0 2.0 2.0 2 consensus 1.0 4 1000000

Traceback (most recent call last):
File "/home/qfwfq/pgm/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 234, in denoise_paired
run_commands([cmd])
File "/home/qfwfq/pgm/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 36, in run_commands
subprocess.run(cmd, check=True)
File "/home/qfwfq/pgm/miniconda3/envs/qiime2-2019.7/lib/python3.6/subprocess.py", line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['run_dada_paired.R', '/home/qfwfq/pgm/miniconda3/envs/qiime2-2019.7/tmp/tmp3jlyy1il/forward', '/home/qfwfq/pgm/miniconda3/envs/qiime2-2019.7/tmp/tmp3jlyy1il/reverse', '/home/qfwfq/pgm/miniconda3/envs/qiime2-2019.7/tmp/tmp3jlyy1il/output.tsv.biom', '/home/qfwfq/pgm/miniconda3/envs/qiime2-2019.7/tmp/tmp3jlyy1il/track.tsv', '/home/qfwfq/pgm/miniconda3/envs/qiime2-2019.7/tmp/tmp3jlyy1il/filt_f', '/home/qfwfq/pgm/miniconda3/envs/qiime2-2019.7/tmp/tmp3jlyy1il/filt_r', '0', '0', '0', '0', '2.0', '2.0', '2', 'consensus', '1.0', '4', '1000000']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/qfwfq/pgm/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/q2cli/commands.py", line 327, in call
results = action(**arguments)
File "</home/qfwfq/pgm/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/decorator.py:decorator-gen-459>", line 2, in denoise_paired
File "/home/qfwfq/pgm/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/qiime2/sdk/action.py", line 240, in bound_callable
output_types, provenance)
File "/home/qfwfq/pgm/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/qiime2/sdk/action.py", line 383, in callable_executor
output_views = self._callable(**view_args)
File "/home/qfwfq/pgm/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 249, in denoise_paired
" and stderr to learn more." % e.returncode)
Exception: An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

Nicholas_Bokulich · October 8, 2019, 9:11pm

Hi @potatoo,
It sounds like you have the same type of issue as in the link you provided; ultimately, it boils down to having an extraordinary number of unique sequences.

Any chance barcodes are in your reads? Or adapters or primers are still attached to some reads? I recommend making sure you the answer to both of these is "no".

You could also try running through q2-deblur — see the online tutorials for examples. q2-deblur uses a positive filter before attempting to denoise reads, and could help identify if you have unusual amounts of non-target sequences in your data.

Please let us know what you find!

potatoo · October 9, 2019, 4:04pm

Thanks for your apply,

Before executed DADA2, I tried to remove adapter and primer using Trimmomatic with custom adapter list collected from FastQC, illumina adapter sequence, and Caporaso et al. (2012).

After trimming, these adapters and other technical sequences were less than 0.1 % in all samples.

However, there may be some unknown adapters in samples, as you said. Thus, I will try to use deblur instead and rerun QC pipeline to make sure dataset is adapter-free.

I will update this post, if I find something important.

Thank you for your time and attention.

potatoo

Reference
Caporaso, J. G. et al. Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms. ISME J. 6, 1621–1624 (2012)

potatoo · October 15, 2019, 1:33pm

Hi, @Nicholas_Bokulich,

I have successfully run deblur using the same dataset. Finally, I got 9416 features in 3985 samples which were collected from infants' feces. Total frequency was 77,854,042, and only 40% of joined reads passed through the denoising process (77,864,042 / 194,752,087).

I am not sure about whether these numbers are normal or not. However, the number of features (or unique sequences) seems much lower than the number reported in dada2 error message:

I am currently checking whether representative sequences contain technical sequences, and trying to find more clues from results generated by deblur. Would you please guide me to solve this problem?

Thank you for your attention and time.

potatoo

Related files

deblur execution log: deblur.txt (6.1 MB)
deblur summary: tab_deno_16s_dnmgb_191015.qzv (898.4 KB)
representative sequences: rep_deno_16s_dnmgb_191015.qzv (1.3 MB)

Command (joining)

outdir=/home/qfwfq/DIMP/data/proc/20191007/join_trim_16s_dnmgb_191007
indir=/home/qfwfq/DIMP/data/proc/20190927/qiime_trim_16s_dnmgb_190927/qiime_trim_16s_HETrim_190927.qza

nohup qiime vsearch join-pairs
--i-demultiplexed-seqs ${indir}
--o-joined-sequences ${outdir}/join_trim_16s_dnmgb_191007.qza > ${outdir}/joined_log_20191007.txt 2>&1 &

Command (deblur)

indir=/home/qfwfq/DIMP/data/proc/20191007/join_trim_16s_dnmgb_191007/join_trim_16s_dnmgb_191007.qza
outdir=/home/qfwfq/DIMP/data/proc/20191008/deno_join_16s_dnmgb_191008

cd ${outdir}

nohup qiime deblur denoise-16S
--i-demultiplexed-seqs ${indir}
--p-trim-length 252
--p-sample-stats
--o-representative-sequences ${outdir}/rep_deno_16s_dnmgb_191008.qza
--o-table ${outdir}/tab_deno_16s_dnmgb_191008.qza
--o-stats ${outdir}/statstab_deno_16s_dnmgb_191008.qza > ${outdir}/deblur_log_20191008.txt 2>&1 &

Demultiplexed sequence counts summary

	after trimming adapters	after joining paired-end reads
Minimum:	1378	162
Median:	45755.0	37178.0
Mean:	59572.64226623214	48822.2830283279
Maximum:	2557303	2173932
Total:	237635270	194752087

Table summary (deblur)

Metric	Sample
Number of samples	3,985
Number of features	9,416
Total frequency	77,854,042

Frequency per sample (deblur)

	Frequency
Minimum frequency	2.0
1st quartile	6,386.0
Median frequency	15,014.0
3rd quartile	27,612.0
Maximum frequency	273,337.0
Mean frequency	19,536.773400250942

Frequency per feature (deblur)

	Frequency
Minimum frequency	10.0
1st quartile	16.0
Median frequency	40.0
3rd quartile	220.0
Maximum frequency	5,107,325.0
Mean frequency	8,268.271240441802

Nicholas_Bokulich · October 15, 2019, 4:22pm

40% joined reads failing to pass is no big surprise, since deblur is somewhat more conservative than dada2 in regards to filtering (deblur denoise-16S performs a "positive filter" to throw out anything that does not look like 16S, then throws out anything that looks like it has an error, whereas dada2 attempts to correct those errors).

That does sound a bit high so I wonder if you do have some kind of artifact still in the samples.

Use q2-cutadapt and only trim the primer sequences, not the adapter sequences (those will be trimmed out automatically if any primers are discovered).

I am not sure if "elements" refers to unique features or # samples * # features or what — just that it means that dada2 exploded because it is attempting to create an unwieldily large table (usually because barcodes or some other nonsense is included in the reads).

have you tried running dada2 and setting these parameters to some arbitrary value? I don't really have a rational explanation here just a hunch that maybe you are getting noisy, imprecise joins because you have some low-quality bases on the tails of your reads and it is leading to an unusually high number of features with dada2.

Of course the other option is just to use the deblur results... in all honesty dada2 might not yield many more reads and it looks like you have reasonable sample coverage with the deblur results.

potatoo · October 16, 2019, 7:27am

Thanks for your reply,

In summary, DADA2 create an unusually large table because of unremoved artifacts and low-quality base, resulting in the process suspension.

I will try the following methods to resolve the problem

Use q2-cutadapt and a reliable list to remove potential artifacts
Perform quality filtering and trimming before executing DADA2

I will update this post, if I find something important.

Thank you for your time and attention.

potatoo