Plugin error from dada2, 16S paired-end fastq files can't generate feature table through dada2

zcw15774723795 · June 22, 2024, 3:58pm

Hi everyone, Sorry to bothered your gays. I am a beginner of QIIME2, which is truly confused me a lot. So I want ask for help in the forum.

This is a problem I encountered when I used dada2 to generate a features table. The following is my procedures:

I download a paired-end 16S data from NCBI in the form of .fastq(include two files, one is forward, the other is reverse)
2.I used fastp to trim adapters and filter out low-quality sequences from the paired-end 16S rRNA gene sequencing files.
3.I import them into qiime2 with the following code:

time qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path /mypath/SRR19603331/manifeat.tsv \
--output-path paired-end-demux_1.qza \
--input-format PairedEndFastqManifestPhred33V2

I wanted to use dada2 to generated a feature table by following code:

qiime dada2 denoise-paired \
--i-demultiplexed-seqs paired-end-demux_1.qza \
--p-trunc-len-f 0 \
--p-trunc-len-r 0 \
--p-n-threads 20 \
--o-table table.qza \
--o-representative-sequences rep-seqs.qza \
--o-denoising-stats denoising-stats.qza

However, it encountered a error and return as following:

Plugin error from dada2:

  An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

Debug info has been saved to /tmp/qiime2-q2cli-err-1xrwtbtm.log

Thank you all in advance for your help and support. I truly appreciate any guidance you can provide.

The Debug info file is as following:

Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada.R --input_directory /tmp/tmp22ui1r15/forward --input_directory_reverse /tmp/tmp22ui1r15/reverse --output_path /tmp/tmp22ui1r15/output.tsv.biom --output_track /tmp/tmp22ui1r15/track.tsv --filtered_directory /tmp/tmp22ui1r15/filt_f --filtered_directory_reverse /tmp/tmp22ui1r15/filt_r --truncation_length 0 --truncation_length_reverse 0 --trim_left 0 --trim_left_reverse 0 --max_expected_errors 2.0 --max_expected_errors_reverse 2.0 --truncation_quality_score 2 --min_overlap 12 --pooling_method independent --chimera_method consensus --min_parental_fold 1.0 --allow_one_off False --num_threads 20 --learn_min_reads 1000000

R version 4.3.3 (2024-02-29)
Loading required package: Rcpp
DADA2: 1.30.0 / Rcpp: 1.0.12 / RcppParallel: 5.1.6
2) Filtering .
3) Learning Error Rates
44549066 total bases in 177490 reads from 1 samples will be used for learning the error rates.
Error rates could not be estimated (this is usually because of very few reads).
Error in getErrors(err, enforce = TRUE) : Error matrix is NULL.
6: stop("Error matrix is NULL.")
5: getErrors(err, enforce = TRUE)
4: dada(drps, err = NULL, errorEstimationFunction = errorEstimationFunction,
selfConsist = TRUE, multithread = multithread, verbose = verbose,
MAX_CONSIST = MAX_CONSIST, OMEGA_C = OMEGA_C, ...)
3: learnErrors(filts, nreads = nreads.learn, multithread = multithread)
2: withCallingHandlers(expr, warning = function(w) if (inherits(w,
classes)) tryInvokeRestart("muffleWarning"))
1: suppressWarnings(learnErrors(filts, nreads = nreads.learn, multithread = multithread))
Traceback (most recent call last):
File "/home/zcw/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2_dada2/_denoise.py", line 350, in denoise_paired
run_commands([cmd])
File "/home/zcw/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2_dada2/_denoise.py", line 37, in run_commands
subprocess.run(cmd, check=True)
File "/home/zcw/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['run_dada.R', '--input_directory', '/tmp/tmp22ui1r15/forward', '--input_directory_reverse', '/tmp/tmp22ui1r15/reverse', '--output_path', '/tmp/tmp22ui1r15/output.tsv.biom', '--output_track', '/tmp/tmp22ui1r15/track.tsv', '--filtered_directory', '/tmp/tmp22ui1r15/filt_f', '--filtered_directory_reverse', '/tmp/tmp22ui1r15/filt_r', '--truncation_length', '0', '--truncation_length_reverse', '0', '--trim_left', '0', '--trim_left_reverse', '0', '--max_expected_errors', '2.0', '--max_expected_errors_reverse', '2.0', '--truncation_quality_score', '2', '--min_overlap', '12', '--pooling_method', 'independent', '--chimera_method', 'consensus', '--min_parental_fold', '1.0', '--allow_one_off', 'False', '--num_threads', '20', '--learn_min_reads', '1000000']' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/zcw/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2cli/commands.py", line 520, in call
results = self._execute_action(
File "/home/zcw/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2cli/commands.py", line 581, in _execute_action
results = action(**arguments)
File "", line 2, in denoise_paired
File "/home/zcw/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/qiime2/sdk/action.py", line 342, in bound_callable
outputs = self.callable_executor(
File "/home/zcw/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/qiime2/sdk/action.py", line 576, in callable_executor
output_views = self._callable(**view_args)
File "/home/zcw/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2_dada2/_denoise.py", line 363, in denoise_paired
raise Exception("An error was encountered while running DADA2"
Exception: An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

colinbrislawn · June 22, 2024, 4:07pm

Hello and welcome to the forums! :qiime2:

Thank you for posting your full and command and error message.

Hiding in the middle of the error is this insightful line.

Learning Error Rates
44549066 total bases in 177490 reads from 1 samples will be used for learning the error rates.
Error rates could not be estimated (this is usually because of very few reads).
Error in getErrors(err, enforce = TRUE) : Error matrix is NULL.
6: stop("Error matrix is NULL.")

While you have many reads, I wonder if the single sample is causing issues.

How many samples are in this cohort? Are there more samples you can include?

zcw15774723795 · June 22, 2024, 4:46pm

Hi, I want to express my deepest gratitude for your response. As a student from China, it is currently 12:40 AM here, and I may not be able to reply promptly after I go to sleep. Please forgive any delay in my response. Thank you for your understanding.

Yes, there are totally 56 samples in my cohort. For the sake of saving the disk space（although there is no need for that because our lab own a powerful computer that has a 25T disk）I try to analysis them one by one. I will try use more samples(almost 10 is ok?) to do dada2 tomorrow.

I would like to thank you once again for answering my questions. If you ever have the chance to visit China, I would be more than happy to recommend many wonderful places for you to explore!

colinbrislawn · June 22, 2024, 7:40pm

Please take your time. I try not to rush myself or others.

I'm glad you have enough computer power and disk space to work with data of this size.

DADA2 scales well and should be able to handle all samples on a sequencing run. I would include all 56 samples.
This should be faster than running them one at a time as well.

zcw15774723795 · June 23, 2024, 5:55pm

Hi! Sorry to bother you again. But it still couldn't work when I try use all samples(56 samples) many times. It very confusing for me to find out bugs. The detailed descriptions are as following.

DOWNLOAD ALL SAMPLES
I downloaded all samples from NCBI SRA database through prefetch, a NCBI download tool.
USE FASTP TO TRIM ADAPTERS
IMPORT INTO QIIME2
I import fastp trimmed samples into QIIME2 by the following code:

time qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path /data_1/zcw_data/QIIME2_analysis/QIIME2_ALL/manifeat_paired.tsv \
--output-path paired-end-demux_samples_all.qza \
--input-format PairedEndFastqManifestPhred33V2

The manifeat_paired.tsv file is attached here.
manifeat_paired.tsv (8.3 KB)
4. DADA2 GENERATE FEATURE TABLE
I use the following code:

qiime dada2 denoise-paired \
--i-demultiplexed-seqs paired-end-demux_samples_all.qza \
--p-trunc-len-f 0 \
--p-trunc-len-r 0 \
--p-trim-left-f 0 \
--p-trim-left-r 0 \
--p-n-threads 0 \
--o-table table.qza \
--o-representative-sequences rep-seqs.qza \
--o-denoising-stats denoising-stats.qza

Although I have use all samples(totally 56 samples), this process still encountered error which I encountered before.

Plugin error from dada2:

  An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

Debug info has been saved to /tmp/qiime2-q2cli-err-dccqk_53.log

Thank you for your patient and attention!
The /tmp/qiime2-q2cli-err-dccqk_53.log file is below:

Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada.R --input_directory /tmp/tmpffc07491/forward --input_directory_reverse /tmp/tmpffc07491/reverse --output_path /tmp/tmpffc07491/output.tsv.biom --output_track /tmp/tmpffc07491/track.tsv --filtered_directory /tmp/tmpffc07491/filt_f --filtered_directory_reverse /tmp/tmpffc07491/filt_r --truncation_length 0 --truncation_length_reverse 0 --trim_left 0 --trim_left_reverse 0 --max_expected_errors 2.0 --max_expected_errors_reverse 2.0 --truncation_quality_score 2 --min_overlap 12 --pooling_method independent --chimera_method consensus --min_parental_fold 1.0 --allow_one_off False --num_threads 0 --learn_min_reads 1000000

R version 4.3.3 (2024-02-29)
Loading required package: Rcpp
DADA2: 1.30.0 / Rcpp: 1.0.12 / RcppParallel: 5.1.6
2) Filtering ........................................................
3) Learning Error Rates
280222806 total bases in 1116452 reads from 7 samples will be used for learning the error rates.
Error rates could not be estimated (this is usually because of very few reads).
Error in getErrors(err, enforce = TRUE) : Error matrix is NULL.
6: stop("Error matrix is NULL.")
5: getErrors(err, enforce = TRUE)
4: dada(drps, err = NULL, errorEstimationFunction = errorEstimationFunction,
       selfConsist = TRUE, multithread = multithread, verbose = verbose,
       MAX_CONSIST = MAX_CONSIST, OMEGA_C = OMEGA_C, ...)
3: learnErrors(filts, nreads = nreads.learn, multithread = multithread)
2: withCallingHandlers(expr, warning = function(w) if (inherits(w,
       classes)) tryInvokeRestart("muffleWarning"))
1: suppressWarnings(learnErrors(filts, nreads = nreads.learn, multithread = multithread))
Traceback (most recent call last):
  File "/home/zcw/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2_dada2/_denoise.py", line 350, in denoise_paired
    run_commands([cmd])
  File "/home/zcw/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2_dada2/_denoise.py", line 37, in run_commands
    subprocess.run(cmd, check=True)
  File "/home/zcw/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['run_dada.R', '--input_directory', '/tmp/tmpffc07491/forward', '--input_directory_reverse', '/tmp/tmpffc07491/reverse', '--output_path', '/tmp/tmpffc07491/output.tsv.biom', '--output_track', '/tmp/tmpffc07491/track.tsv', '--filtered_directory', '/tmp/tmpffc07491/filt_f', '--filtered_directory_reverse', '/tmp/tmpffc07491/filt_r', '--truncation_length', '0', '--truncation_length_reverse', '0', '--trim_left', '0', '--trim_left_reverse', '0', '--max_expected_errors', '2.0', '--max_expected_errors_reverse', '2.0', '--truncation_quality_score', '2', '--min_overlap', '12', '--pooling_method', 'independent', '--chimera_method', 'consensus', '--min_parental_fold', '1.0', '--allow_one_off', 'False', '--num_threads', '0', '--learn_min_reads', '1000000']' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/zcw/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2cli/commands.py", line 520, in __call__
    results = self._execute_action(
  File "/home/zcw/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2cli/commands.py", line 581, in _execute_action
    results = action(**arguments)
  File "<decorator-gen-49>", line 2, in denoise_paired
  File "/home/zcw/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/qiime2/sdk/action.py", line 342, in bound_callable
    outputs = self._callable_executor_(
  File "/home/zcw/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/qiime2/sdk/action.py", line 576, in _callable_executor_
    output_views = self._callable(**view_args)
  File "/home/zcw/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2_dada2/_denoise.py", line 363, in denoise_paired
    raise Exception("An error was encountered while running DADA2"
Exception: An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

colinbrislawn · June 23, 2024, 10:19pm

Yes, this looks like the same error.

Let's inspect the data before running DADA2 and look for issues.

Can you run qiime demux summarize and post the output files here?

Ah. I've used fastp for metagenomic data before! It's a great program.

Because it does not have a qiime2 plugin, we can't record how it's used during these crucial first steps. Until fastp has a Qiime2 plugin, using the trimming settings within DADA2 may be better.

zcw15774723795 · June 24, 2024, 5:02am

Hi! Thank you for your attention!
I have run qiime demux summarize to generate a report of samples. This is my code:

time qiime demux summarize \
> --i-data paired-end-demux_samples_all \
> --o-visualization ./demux_seqs_all.qzv

The output file is attached to here.
demux_seqs_all.qzv (312.9 KB)
The summary plot is generated by the website here.
This is how the picture like:

Thank you again for your help!

colinbrislawn · June 24, 2024, 1:13pm

Thank you for posting that!

I have discovered the problem on the 'Interactive Quality Plot' tab.

It looks like the quality scores are all exactly 30. I wonder if this is from fastp settings or perhaps it was already like this in NCBI?

This is a problem because DADA2 uses the quality scores to construct an error model.

Does the raw data before running fastp have a range of quality scores? Perhaps you could import that and take a look at the quality plot.

zcw15774723795 · June 25, 2024, 5:18am

Hi! Thank you for pointing out the issue.
I find out this may be caused by download and unzip processes.
The following are the processes I used before:

DOWNLOAD NCBI SAMPLE FILES THROUGH prefetch
Generally, the download process is like this:

prefetch SRA_ID  --location NCBI

After downloading the .sralite file(a kind of zip file), I use fastq-dump to unzip this file:

fastq-dump --split-3 SRA.sralite

3.Finally, two paired-end or one single file will be generated. The file's content is like below:

@SRR19603331.1 1 length=251
ACTCCTACGGGAGGCAGCAGTAGGGAATCTTCCACAATGGACGCAAGTCTGATGGAGCAACGCCGCGTGAGTGAAGAAGGTTTTCGGATCGTAAAGCTCTGTTGTTGGTGAAGAAGGATAGAGGTAGTAACTGGCCTTTATTTGACGGTAATCAACCAGAAAGTCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTCACGTGAGAGCAGGCGG
+SRR19603331.1 1 length=251
???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
@SRR19603331.2 2 length=251
ACTCCTACGGGAGGCAGCAGTAGGGAATCTTCCACAATGGACGCAAGTCTGATGGAGCAACGCCGCGTGAGTGAAGAAGGTTTTCGGATCGTAAAGCTCTGTTGTTGGTGAAGAAGGATAGAGGTAGTAACTGGCCTTTATTTGACGGTAATCAACCAGAAAGTCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGTGAGCGCAGGCGG
+SRR19603331.2 2 length=251
???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
@SRR19603331.3 3 length=251
ACTCCTACGGGAGGCAGCAGTAGGGAATCTTCCACAATGGACGCAAGTCTGATGGAGCAACGCCGCGTGAGTGAATAAGGTTTTCGGATCGTAAAGCTCTGTTGTTGGTGAAGAAGGATAGAGGTAGTAAATGGCCTTTATTTGAAGGTAATCAACCAGAAAGTCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATGTATTGGGCGTAAAGCGAGCGCAGGCGG
+SRR19603331.3 3 length=251
???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

We can draw conclusion that this file is not right because there are only '?' in the files.
However, when I use wget to download SRA file, it will be very different.

I enter the one of the run's website, which is displayed below.

image1809×515 61.9 KB

Copying the AWS url and use wget to download the files

wget -b -c https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR19603331/SRR19603331

parallel-fastq-dump was used to generate fastq files from the file which was downloaded before.

 parallel-fastq-dump -t 20  -O ./ --split-3  -s  SRR19603331

Check the fastq files, the result is showed below:

@SRR19603331.1 1 length=251
ACTCCTACGGGAGGCAGCAGTAGGGAATCTTCCACAATGGACGCAAGTCTGATGGAGCAACGCCGCGTGAGTGAAGAAGGTTTTCGGATCGTAAAGCTCTGTTGTTGGTGAAGAAGGATAGAGGTAGTAACTGGCCTTTATTTGACGGTAATCAACCAGAAAGTCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTCACGTGAGAGCAGGCGG
+SRR19603331.1 1 length=251
FFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FF,FFFFFFFFFF:FFFFFFFFFF:FFFFFFF:FF:FFFFFFFFFFFFF:FFFFFFF,F::FF:F:FFF,FF:FFFFFFF:F:,FFFFFFF::F:FF:F:FFFFF:FFF,FFFFFFFFFFFFFFFFF:FFF::FFF::FFFFFF:FFFF:FFF:FFF::FFFFF:,F,FFFFFFF::F,F:,FFFF,FF:F,FFFFF,FF,FF:F:
@SRR19603331.2 2 length=251
ACTCCTACGGGAGGCAGCAGTAGGGAATCTTCCACAATGGACGCAAGTCTGATGGAGCAACGCCGCGTGAGTGAAGAAGGTTTTCGGATCGTAAAGCTCTGTTGTTGGTGAAGAAGGATAGAGGTAGTAACTGGCCTTTATTTGACGGTAATCAACCAGAAAGTCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGTGAGCGCAGGCGG
+SRR19603331.2 2 length=251
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF:FFFFFFFFFFF:F:FFF:FFFFF:FFFF
@SRR19603331.3 3 length=251
ACTCCTACGGGAGGCAGCAGTAGGGAATCTTCCACAATGGACGCAAGTCTGATGGAGCAACGCCGCGTGAGTGAATAAGGTTTTCGGATCGTAAAGCTCTGTTGTTGGTGAAGAAGGATAGAGGTAGTAAATGGCCTTTATTTGAAGGTAATCAACCAGAAAGTCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATGTATTGGGCGTAAAGCGAGCGCAGGCGG
+SRR19603331.3 3 length=251
FFFFF,FF::F:::FFFFFFFFFF:FF:FFFF,:FFFFFFFFFFFF,,FFFFFFFFFF,FFFFF,FF,:FF:FFF,,::FFFFFF,,FFFFFF:FFF,:,FF,FF,FFFFFFF:F,FFFFFFFFFF:F:F,,F:F::F,F,F:,F,FF:,FF:FFF,FFF,F,FFF:,,F,FFFF,F,FF:FFF:FF:FFFF:FF,F,,F,F,FF:::F,FFFF,FFFF:,,F,F,F:F:,F::F:,,:,,F:,FFF,FF,

This file is right! In a nutshell, it seems like the problem is caused by prefetch

Thank you for your patient and kindness! I will try this way to generate a feature table and do species annotation.

colinbrislawn · June 25, 2024, 2:00pm

Yes, I agree. Good detective work finding the source of this problem!

Ok great!

If you have more questions, feel free to open a new thread.

We try to keep each thread on one topic.

system · July 26, 2024, 8:01pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.