Error rates could not be estimated -- unsure why

alexkrohn · January 12, 2022, 7:33pm

I'm running dada2 on paired end data and getting the same error over and over again. My data are from an 150 PE run on an Illumina iSeq machine, sequencing a 16s fragment for vertebrates. The samples were demultiplexed by Casava based on the i5 and i7 adapters. Some samples returned very few reads, so I repeated these steps with only those samples that returned more than 19k reads. Still, I got the same result. Here's the code that I ran for the full dataset:

qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]'  --input-path pe-demux-results/manifest-file.txt --input-format PairedEndFastqManifestPhred33V2 --output-path pe-demux-results/demux16s.qza

qiime dada2 denoise-paired --i-demultiplexed-seqs demux-16s.qza --p-trim-left-f 24 --p-trim-left-r 21 --p-trunc-len-f 148 --p-trunc-len-r 142 --o-table table-pe.qza --o-representative-sequences rep-seqs.qza --o-denoising-stats denoising-stats.qza --verbose
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_paired.R /var/folders/kk/scwljzqs5gb9zsd8m4rxwsbr0000gn/T/tmpb7ed8hkc/forward /var/folders/kk/scwljzqs5gb9zsd8m4rxwsbr0000gn/T/tmpb7ed8hkc/reverse /var/folders/kk/scwljzqs5gb9zsd8m4rxwsbr0000gn/T/tmpb7ed8hkc/output.tsv.biom /var/folders/kk/scwljzqs5gb9zsd8m4rxwsbr0000gn/T/tmpb7ed8hkc/track.tsv /var/folders/kk/scwljzqs5gb9zsd8m4rxwsbr0000gn/T/tmpb7ed8hkc/filt_f /var/folders/kk/scwljzqs5gb9zsd8m4rxwsbr0000gn/T/tmpb7ed8hkc/filt_r 148 142 24 21 2.0 2.0 2 12 independent consensus 1.0 1 1000000

R version 4.0.5 (2021-03-31) 
Loading required package: Rcpp
DADA2: 1.18.0 / Rcpp: 1.0.7 / RcppParallel: 5.1.4 
1) Filtering .............
2) Learning Error Rates
133563748 total bases in 1077127 reads from 4 samples will be used for learning the error rates.
130332367 total bases in 1077127 reads from 4 samples will be used for learning the error rates.
Error rates could not be estimated (this is usually because of very few reads).
Error in getErrors(err, enforce = TRUE) : Error matrix is NULL.
Execution halted
Traceback (most recent call last):
  File "/Users/AirAlex/miniconda3/envs/qiime2-2021.8/lib/python3.8/site-packages/q2_dada2/_denoise.py", line 266, in denoise_paired
    run_commands([cmd])
  File "/Users/AirAlex/miniconda3/envs/qiime2-2021.8/lib/python3.8/site-packages/q2_dada2/_denoise.py", line 36, in run_commands
    subprocess.run(cmd, check=True)
  File "/Users/AirAlex/miniconda3/envs/qiime2-2021.8/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['run_dada_paired.R', '/var/folders/kk/scwljzqs5gb9zsd8m4rxwsbr0000gn/T/tmpb7ed8hkc/forward', '/var/folders/kk/scwljzqs5gb9zsd8m4rxwsbr0000gn/T/tmpb7ed8hkc/reverse', '/var/folders/kk/scwljzqs5gb9zsd8m4rxwsbr0000gn/T/tmpb7ed8hkc/output.tsv.biom', '/var/folders/kk/scwljzqs5gb9zsd8m4rxwsbr0000gn/T/tmpb7ed8hkc/track.tsv', '/var/folders/kk/scwljzqs5gb9zsd8m4rxwsbr0000gn/T/tmpb7ed8hkc/filt_f', '/var/folders/kk/scwljzqs5gb9zsd8m4rxwsbr0000gn/T/tmpb7ed8hkc/filt_r', '148', '142', '24', '21', '2.0', '2.0', '2', '12', 'independent', 'consensus', '1.0', '1', '1000000']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/AirAlex/miniconda3/envs/qiime2-2021.8/lib/python3.8/site-packages/q2cli/commands.py", line 329, in __call__
    results = action(**arguments)
  File "<decorator-gen-572>", line 2, in denoise_paired
  File "/Users/AirAlex/miniconda3/envs/qiime2-2021.8/lib/python3.8/site-packages/qiime2/sdk/action.py", line 245, in bound_callable
    outputs = self._callable_executor_(scope, callable_args,
  File "/Users/AirAlex/miniconda3/envs/qiime2-2021.8/lib/python3.8/site-packages/qiime2/sdk/action.py", line 391, in _callable_executor_
    output_views = self._callable(**view_args)
  File "/Users/AirAlex/miniconda3/envs/qiime2-2021.8/lib/python3.8/site-packages/q2_dada2/_denoise.py", line 279, in denoise_paired
    raise Exception("An error was encountered while running DADA2"
Exception: An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

Plugin error from dada2:

  An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

See above for debug info.

As you can see from the demux-16s-viz.qzv, the median quality score remains high, even though it is variable towards the end.

The target amplicon is only 250 bp long, so there should be just enough overlap in the reads to merge R1 and R2.

FWIW, the dada step works fine with if analyzed as SE reads (R1 and R2 analyzed separately), even though there are just as many samples with very few reads.

Any ideas?

colinbrislawn · January 13, 2022, 3:25pm

Hello Alex,

Bad news, it's an issue with dada2 and the iSeq quality scores.

The iSeq uses Illumina's new binned quality scores (official Illumina docs PDF), and I can clearly see that binning in your quality score plots.

Check out this forum thread and this GitHub issue for possible workarounds.

Good luck! Please let us know if you find a solution that works for you!

alexkrohn · January 13, 2022, 3:55pm

Thanks @colinbrislawn.

That's frustrating. One thing I appreciate about qiime is it ability to build in a bunch of steps into one (e.g. denoise-paired denoises, remove chimeras, and dereplicates all in one). As it seems NovaSeq and iSeq are the future, it would be great to have a built-in workaround.

If the under-the-hood dada2 script doesn't support the new quality scores, how will running dada2 separately solve that issue? To make sure I have this right, I would have to run Qiime (or make something up) to remove primers and order R1 and R2 reads, then use dada2 separately with a modified getErrors(errF) function. Is that right? Can you point me to the scripts used for denoise paired so that I can cobble together Qiime code and modify dada accordingly?

colinbrislawn · January 13, 2022, 7:54pm

Let's take a trip to the reframing department.

Oh, it totally is. The worst part is that Illumina's new binning method is still technically compliant with the fastq specification, however it breaks expectations of how fastq files work and violates dada2's error model. Basically, this has exposed an edge case of the fastq spec, and now that Illumina is shipping these fastq files, it's not an edge case anymore and we have to find a way to to support it.

Me too, because all these steps are hard and Qiime2 makes it look easy.

I'm glad you reached out on GitHub and are working through the problem. Let us know if you find a solution, as that would really help future users.

alexkrohn · January 13, 2022, 10:07pm

@colinbrislawn Will do

I assume I'd run into these same problems even if I used deblur instead of dada2, right? Because deblur also uses the distribution of sequencing errors based on the quality scores?

colinbrislawn · January 14, 2022, 2:20pm

Not necessarily! Deblur uses a different error model, so it might work just fine with the binned quality scores.

Definitely worth a try!

EDIT: Deblur uses a pretrained error model, which won't mind these binned scores, but also was not built for them. So, idk. More discussion about binned quality scores and deblur in this thread.

alexkrohn · January 17, 2022, 8:17pm

Hey @colinbrislawn. I couldn't get dada2 to work on my data, even with the workarounds, so I ended up using Deblur. It worked, and the results are comparable to previous datasets that we've sequenced, so I tend to believe it.

A workaround in dada2 for the new Illumina quality scores would be great, but until then, I guess I'll go with Deblur.

I appreciate your help and your updates to this thread/problem.

system · February 18, 2022, 2:18am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.