Help with troubleshooting dada2 denoise-paired error. [NAs produced by integer overflow...is there a fix?]

I encountered the following error during the dada2 denoise-paired step after removing primers using cutadapt (see below for the command and error). The amplicon has an average length around 183 bp. I'm using qiime2-amplicon-2024.5 on a mac M3 max computer. I've run this exact code for many other libraries, all prepped exactly the same using the same assay, but have never encountered this error. I did notice that the R1 & R2 fastq.gz files for one sample contain no readable data. Could this explain the error? I'm rerunning now after removing that sample to see if it works, but I was hoping someone here may have some insights. Thanks!

(qiime2-amplicon-2024.5) rdceradk@CERADK-MN-BB888 Trinity_Orchid_inverts_09Dec2024 % qiime dada2 denoise-paired
--i-demultiplexed-seqs demux-paired-end-trimmed.qza
--p-trunc-len-f 125
--p-trunc-len-r 125
--p-trim-left-f 0
--p-trim-left-r 0
--p-n-threads 15
--p-n-reads-learn 1000000
--o-representative-sequences rep-seqs-dada.qza
--o-table table-dada.qza
--o-denoising-stats stats-dada.qza
--verbose
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada.R --input_directory /var/folders/tf/9kbmk08x4y19x3rhtkqd_cjm0000gr/T/tmputofb3vt/forward --input_directory_reverse /var/folders/tf/9kbmk08x4y19x3rhtkqd_cjm0000gr/T/tmputofb3vt/reverse --output_path /var/folders/tf/9kbmk08x4y19x3rhtkqd_cjm0000gr/T/tmputofb3vt/output.tsv.biom --output_track /var/folders/tf/9kbmk08x4y19x3rhtkqd_cjm0000gr/T/tmputofb3vt/track.tsv --filtered_directory /var/folders/tf/9kbmk08x4y19x3rhtkqd_cjm0000gr/T/tmputofb3vt/filt_f --filtered_directory_reverse /var/folders/tf/9kbmk08x4y19x3rhtkqd_cjm0000gr/T/tmputofb3vt/filt_r --truncation_length 125 --truncation_length_reverse 125 --trim_left 0 --trim_left_reverse 0 --max_expected_errors 2.0 --max_expected_errors_reverse 2.0 --truncation_quality_score 2 --min_overlap 12 --pooling_method independent --chimera_method consensus --min_parental_fold 1.0 --allow_one_off False --num_threads 15 --learn_min_reads 1000000

R version 4.3.3 (2024-02-29)
Loading required package: Rcpp
DADA2: 1.30.0 / Rcpp: 1.0.12 / RcppParallel: 5.1.6
2) Filtering The filter removed all reads: /var/folders/tf/9kbmk08x4y19x3rhtkqd_cjm0000gr/T/tmputofb3vt/filt_f/4March2024-76_L001_R1_001.fastq.gz and /var/folders/tf/9kbmk08x4y19x3rhtkqd_cjm0000gr/T/tmputofb3vt/filt_r/4March2024-76_L001_R2_001.fastq.gz not written.
Some input samples had no reads pass the filter.
........................................................................................................................................................................................................................x............................................................
3) Learning Error Rates
312203250 total bases in 2497626 reads from 3 samples will be used for learning the error rates.
312203250 total bases in 2497626 reads from 3 samples will be used for learning the error rates.
3) Denoise samples ................................................................Error in dada(drpF, err = err, multithread = multithread, verbose = FALSE) :
NAs in derep$quals matrix. Check that all input sequences had valid associated qualities assigned.
In addition: Warning message:
In derepQuals[sqnms, ] + out$cum_quals[sqnms, ] :
NAs produced by integer overflow
2: stop("NAs in derep$quals matrix. Check that all input sequences had valid associated qualities assigned.")
1: dada(drpF, err = err, multithread = multithread, verbose = FALSE)
Traceback (most recent call last):
File "/opt/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2_dada2/_denoise.py", line 350, in denoise_paired
run_commands([cmd])
File "/opt/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2_dada2/_denoise.py", line 37, in run_commands
subprocess.run(cmd, check=True)
File "/opt/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['run_dada.R', '--input_directory', '/var/folders/tf/9kbmk08x4y19x3rhtkqd_cjm0000gr/T/tmputofb3vt/forward', '--input_directory_reverse', '/var/folders/tf/9kbmk08x4y19x3rhtkqd_cjm0000gr/T/tmputofb3vt/reverse', '--output_path', '/var/folders/tf/9kbmk08x4y19x3rhtkqd_cjm0000gr/T/tmputofb3vt/output.tsv.biom', '--output_track', '/var/folders/tf/9kbmk08x4y19x3rhtkqd_cjm0000gr/T/tmputofb3vt/track.tsv', '--filtered_directory', '/var/folders/tf/9kbmk08x4y19x3rhtkqd_cjm0000gr/T/tmputofb3vt/filt_f', '--filtered_directory_reverse', '/var/folders/tf/9kbmk08x4y19x3rhtkqd_cjm0000gr/T/tmputofb3vt/filt_r', '--truncation_length', '125', '--truncation_length_reverse', '125', '--trim_left', '0', '--trim_left_reverse', '0', '--max_expected_errors', '2.0', '--max_expected_errors_reverse', '2.0', '--truncation_quality_score', '2', '--min_overlap', '12', '--pooling_method', 'independent', '--chimera_method', 'consensus', '--min_parental_fold', '1.0', '--allow_one_off', 'False', '--num_threads', '15', '--learn_min_reads', '1000000']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2cli/commands.py", line 520, in call
results = self._execute_action(
File "/opt/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2cli/commands.py", line 581, in _execute_action
results = action(**arguments)
File "", line 2, in denoise_paired
File "/opt/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/qiime2/sdk/action.py", line 342, in bound_callable
outputs = self.callable_executor(
File "/opt/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/qiime2/sdk/action.py", line 576, in callable_executor
output_views = self._callable(**view_args)
File "/opt/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2_dada2/_denoise.py", line 363, in denoise_paired
raise Exception("An error was encountered while running DADA2"
Exception: An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

Plugin error from dada2:

An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

See above for debug info.

It could, so it's worth a try!

The actually looks like the reads are present, but quality is missing or messed up in some way:

NAs in derep$quals matrix. Check that all input sequences had valid associated qualities assigned.

1 Like

Removing the sample did not work. I'm confused as to why quality is missing/messed up. I have not changed our protocol which has worked perfectly across multiple libraries. Any thoughts on how to check for missing/messed up quality scores? At this point I'm stumped so any help would be much appreciated!

1 Like

I ran the data in R via the dada2 pipeline. I got the same error, but this pipeline shows me which sample was being processed when the error occurred, and it's the largest sample, by far (1.83GB for R2 alone). Is there some kind of file size limitation?

Not that I know of. DADA2 is designed to work with big data.
I suppose if something is being loaded into memory and then there is a soft limit based on your ram...

Perhaps you could use vsearch to filter our low quality reads from your fastq files, reducing file size a bit, or a lot! Vserach should also tell you if these files are messed up in some way.

I've not had this exact issue before, so these are all open-ended suggestions.

Thanks. Turns out it is a known file size bug in dada2 (Help with troubleshooting qiime dada2 denoise-paired error. [NAs produced by integer overflow...is there a fix?] · Issue #2076 · benjjneb/dada2 · GitHub)! I was able to get around the bug by splitting the fastq.gz files causing the error.

1 Like