error from dada2 plugin

HI QIIME2,
I am facing a technical issue and really appreciate an advice.
I have trimmed primers from my raw data using cutadapt paired end to produce qza file.
Please follow the link for qza folder (https://rutgersconnect-my.sharepoint.com/:u:/r/personal/yadavsk_rwjms_rutgers_edu/Documents/trimmed-seqs.qza?csf=1&web=1&e=FJPJIp)
Please see attached for qzv filetrimmed-seqs.qzv (291.3 KB) .

Then I used following commands to run command for dada2 analysis.

qiime dada2 denoise-paired --i-demultiplexed-seqs trimmed-seqs.qza --p-trim-left-f 0 --p-trim-left-r 0 --p-trunc-len-f 0 --p-trunc-len-r 248 --o-table table.qza --o-representative-sequences rep-seqs.qza --o-denoising-stats denoising-stats.qza

After running command I am getting error “Plugin error from dada2” and a error log was saved.
I am pasting the error below.

(qiime2-2020.2) [email protected]:/media/sf_Shared_Folder/Final$ cat /tmp/qiime2-q2cli-err-bhy2omnu.log
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_paired.R /tmp/tmp2x49fpuj/forward /tmp/tmp2x49fpuj/reverse /tmp/tmp2x49fpuj/output.tsv.biom /tmp/tmp2x49fpuj/track.tsv /tmp/tmp2x49fpuj/filt_f /tmp/tmp2x49fpuj/filt_r 0 248 0 0 2.0 2.0 2 consensus 1.0 1 1000000

R version 3.5.1 (2018-07-02) 
Loading required package: Rcpp
DADA2: 1.10.0 / Rcpp: 1.0.3 / RcppParallel: 4.4.4 
1) Filtering ................
2) Learning Error Rates
167947 total bases in 724 reads from 16 samples will be used for learning the error rates.
179552 total bases in 724 reads from 16 samples will be used for learning the error rates.
Error in err[c(1, 6, 11, 16), ] <- 1 : 
  incorrect number of subscripts on matrix
Execution halted
Traceback (most recent call last):
  File "/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 257, in denoise_paired
    run_commands([cmd])
  File "/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 36, in run_commands
    subprocess.run(cmd, check=True)
  File "/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/subprocess.py", line 418, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['run_dada_paired.R', '/tmp/tmp2x49fpuj/forward', '/tmp/tmp2x49fpuj/reverse', '/tmp/tmp2x49fpuj/output.tsv.biom', '/tmp/tmp2x49fpuj/track.tsv', '/tmp/tmp2x49fpuj/filt_f', '/tmp/tmp2x49fpuj/filt_r', '0', '248', '0', '0', '2.0', '2.0', '2', 'consensus', '1.0', '1', '1000000']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/q2cli/commands.py", line 328, in __call__
    results = action(**arguments)
  File "</home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/decorator.py:decorator-gen-455>", line 2, in denoise_paired
  File "/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/qiime2/sdk/action.py", line 245, in bound_callable
    output_types, provenance)
  File "/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/qiime2/sdk/action.py", line 390, in _callable_executor_
    output_views = self._callable(**view_args)
  File "/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 272, in denoise_paired
    " and stderr to learn more." % e.returncode)
Exception: An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

I was going through the forum and found this post with similar kind of problem (Doing taxonomy analysis and getting abundancies with manifests)
But I am not able to understand the problem since my sequencing facility provided me with fastq file with quality score. I can see lot of F, : and , symbols below nucleotide sequence in my fastq files. Are they quality score. If I am not able to use dada2 for this data then what should do next? any suggestion will be appreciated.

Thank you for the help.

Regards
Sudhir

Hi @Sky23! It looks like you are running this in a Virtualbox VM, is that right? Have you customized the available RAM for the VM? If not, that should be your next stop - the VM is configured to use the least amount of RAM necessary, that way people can run tutorial datasets, etc. This part of the error message

happens usually when too many threads are specified, resulting in too much RAM being used. In your case, it doesn’t look like you ran the command with multiple threads, which is why I am wondering if its just running out of memory pretty quickly. Let us know!

:qiime2:

Hi Mathew,
Thank you for your suggestion.
Yes, I am using VM for running qiime2 in my desktop.
I assigned 16 GB ram out of 64 GB total ram and 5 CPU out of 16 CPU for VM.
Is it not enough to run my analysis?
As suggested by you I tried to use multiple threads in command, please see my commands below.
(qiime2-2020.2) [email protected]:/media/sf_Shared_Folder/Final$ qiime dada2 denoise-paired --i-demultiplexed-seqs trimmed-seqs.qza --p-trim-left-f 0 --p-trim-left-r 0 --p-trunc-len-f 0 --p-trunc-len-r 248 --p-n-threads 5 --o-table table.qza --o-representative-sequences rep-seqs.qza --o-denoising-stats denoising-stats.qza

However I am still getting the error again which i pasted below.
(qiime2-2020.2) [email protected]:/media/sf_Shared_Folder/Final$ cat /tmp/qiime2-q2cli-err-pnjwghs7.log
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_paired.R /tmp/tmp5uqy8_y9/forward /tmp/tmp5uqy8_y9/reverse /tmp/tmp5uqy8_y9/output.tsv.biom /tmp/tmp5uqy8_y9/track.tsv /tmp/tmp5uqy8_y9/filt_f /tmp/tmp5uqy8_y9/filt_r 0 248 0 0 2.0 2.0 2 consensus 1.0 5 1000000

R version 3.5.1 (2018-07-02)
Loading required package: Rcpp
DADA2: 1.10.0 / Rcpp: 1.0.3 / RcppParallel: 4.4.4

  1. Filtering …
  2. Learning Error Rates
    167947 total bases in 724 reads from 16 samples will be used for learning the error rates.
    179552 total bases in 724 reads from 16 samples will be used for learning the error rates.
    Error in err[c(1, 6, 11, 16), ] <- 1 :
    incorrect number of subscripts on matrix
    Execution halted
    Traceback (most recent call last):
    File “/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/q2_dada2/_denoise.py”, line 257, in denoise_paired
    run_commands([cmd])
    File “/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/q2_dada2/_denoise.py”, line 36, in run_commands
    subprocess.run(cmd, check=True)
    File “/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/subprocess.py”, line 418, in run
    output=stdout, stderr=stderr)
    subprocess.CalledProcessError: Command ‘[‘run_dada_paired.R’, ‘/tmp/tmp5uqy8_y9/forward’, ‘/tmp/tmp5uqy8_y9/reverse’, ‘/tmp/tmp5uqy8_y9/output.tsv.biom’, ‘/tmp/tmp5uqy8_y9/track.tsv’, ‘/tmp/tmp5uqy8_y9/filt_f’, ‘/tmp/tmp5uqy8_y9/filt_r’, ‘0’, ‘248’, ‘0’, ‘0’, ‘2.0’, ‘2.0’, ‘2’, ‘consensus’, ‘1.0’, ‘5’, ‘1000000’]’ returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/q2cli/commands.py”, line 328, in call
results = action(**arguments)
File “</home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/decorator.py:decorator-gen-455>”, line 2, in denoise_paired
File “/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/qiime2/sdk/action.py”, line 245, in bound_callable
output_types, provenance)
File “/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/qiime2/sdk/action.py”, line 390, in callable_executor
output_views = self._callable(**view_args)
File “/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/q2_dada2/_denoise.py”, line 272, in denoise_paired
" and stderr to learn more." % e.returncode)
Exception: An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.
How can I solve this problem? Please advice.
Thank you
Regards
Sudhir

Hi @Sky23!

Hmm, maybe not enough, it depends on the characteristics of the data.

I think there might be a misunderstanding - the RAM requirements increase as you add more threads - making the problem worse. Since you can’t specify less than 1 threads, the only other thing to do is add more RAM - make sense?

Now I configured my VM with 32 GB ram and ran the dada2 again.

I used following commands
(qiime2-2020.2) [email protected]:/media/sf_Shared_Folder/Final$ qiime dada2 denoise-paired \

–i-demultiplexed-seqs trimmed-seqs.qza
–p-trim-left-f 0
–p-trim-left-r 0
–p-trunc-len-f 0
–p-trunc-len-r 248
–p-n-threads 1
–o-table table.qza
–o-representative-sequences rep-seqs.qza
–o-denoising-stats denoising-stats.qza

I encountered the similar problem. Please see the error log below.

(qiime2-2020.2) [email protected]:/media/sf_Shared_Folder/Final$ cat /tmp/qiime2-q2cli-err-coliwnhy.log
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_paired.R /tmp/tmpfjou1cz4/forward /tmp/tmpfjou1cz4/reverse /tmp/tmpfjou1cz4/output.tsv.biom /tmp/tmpfjou1cz4/track.tsv /tmp/tmpfjou1cz4/filt_f /tmp/tmpfjou1cz4/filt_r 0 248 0 0 2.0 2.0 2 consensus 1.0 1 1000000

R version 3.5.1 (2018-07-02)
Loading required package: Rcpp
DADA2: 1.10.0 / Rcpp: 1.0.3 / RcppParallel: 4.4.4

  1. Filtering …
  2. Learning Error Rates
    167947 total bases in 724 reads from 16 samples will be used for learning the error rates.
    179552 total bases in 724 reads from 16 samples will be used for learning the error rates.
    Error in err[c(1, 6, 11, 16), ] <- 1 :
    incorrect number of subscripts on matrix
    Execution halted
    Traceback (most recent call last):
    File “/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/q2_dada2/_denoise.py”, line 257, in denoise_paired
    run_commands([cmd])
    File “/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/q2_dada2/_denoise.py”, line 36, in run_commands
    subprocess.run(cmd, check=True)
    File “/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/subprocess.py”, line 418, in run
    output=stdout, stderr=stderr)
    subprocess.CalledProcessError: Command ‘[‘run_dada_paired.R’, ‘/tmp/tmpfjou1cz4/forward’, ‘/tmp/tmpfjou1cz4/reverse’, ‘/tmp/tmpfjou1cz4/output.tsv.biom’, ‘/tmp/tmpfjou1cz4/track.tsv’, ‘/tmp/tmpfjou1cz4/filt_f’, ‘/tmp/tmpfjou1cz4/filt_r’, ‘0’, ‘248’, ‘0’, ‘0’, ‘2.0’, ‘2.0’, ‘2’, ‘consensus’, ‘1.0’, ‘1’, ‘1000000’]’ returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/q2cli/commands.py”, line 328, in call
results = action(**arguments)
File “</home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/decorator.py:decorator-gen-455>”, line 2, in denoise_paired
File “/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/qiime2/sdk/action.py”, line 245, in bound_callable
output_types, provenance)
File “/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/qiime2/sdk/action.py”, line 390, in callable_executor
output_views = self._callable(**view_args)
File “/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/q2_dada2/_denoise.py”, line 272, in denoise_paired
" and stderr to learn more." % e.returncode)
Exception: An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

Please advice.
Thank you

Hi @Sky23 - can you tell us a bit more about these reads? I just noticed this part:

 179552 total bases in 724 reads from 16 samples will be used for learning the error rates

That sounds like longer-than-usual reads to me - is this from a pyrosequencing platform?

Usually this error means that you have run out of RAM, but it might also be possible that it is cropping up because the sequences have either been modified in some way, or aren’t from a supported platform.

Check out https://benjjneb.github.io/dada2/ for more details. Keep us posted.

HI Mathew,
Thank you for your suggestions and time.
This is illumina Miseq data with 2X250 paired reads for V3-V4 target region (465 bp).
Company used 341F/806R primers. I have a overlap of 35 bp. Do you think it is good enough?
Over the weekend I did several run with different –p-trunc-len-r parameters.
When I was using –p-trunc-len-r at 248, it is giving me the error i pasted in my previous posts.
I decided to use 248 since my reverse read has median quality score below 20 at this read. Please see attached qzv file.trimmed-seqs.qzv (291.3 KB)
However, when I used –p-trunc-len-r 0, my run was successful and got 1500 ASV’s.
Why removing 2 last base of reverse read is making DADA2 to fail.
Do you think it is still a RAM problem?
Any explanation will be appreciated.
Thank you
Regards

There is a small correction in my last mail.
Company used illumina NovaSeq platform plate, not Miseq.

Good morning Mathew. Can you please give any advice on my previous posts?
Thank you

Hi @Sky23:

The version of DADA2 that is in q2-dada2 does not officially support NovaSeq, as best I can recall. There are a few posts floating around here on the forum about it - I don’t have any concrete suggestions for moving forward with DADA2 (or deblur). You could go with OTU clustering, though.

My apologies that it took me a little longer than hoped to get back to you, but please do remember, this support forum is provided free of charge, and the forum moderation team is trying our best to handle all incoming requests. Please take a few minutes to refresh your memory on the Forum Code of Conduct:

https://forum.qiime2.org/faq

Particularly this section: https://forum.qiime2.org/faq#patience

Thanks! :qiime2:

thank you for your suggestions. Regards.

Hi mathew,
I am running OTU clustering on my data since Novaseq 16S rRNA sequencing data have some issues while running DADA2. Please see this link: https://github.com/benjjneb/dada2/issues/791.
I trimmed my paired sequences for primer using cutadapt plugin and then imported my data into qiime2 using manifest file.

Then I joined my pairs using vsearch join pairs. After joining when i looked into my quality plot it looks weird. There is an elevated section in plot with median quality score of 41. Please see the attached picture

. Is it abnormal? or it might be due to different quality binning system in Novaseq plateform?

Then I did the quality filter and de-replicated my data. Since, I am not sure whether i can use deblur with Novaseq data, I am doing following steps:
1)- chimera filter
2)- closed OTU picking with silva. Should I use other data base then silva? Should i train classifier or use whole sequence?

Does my workflow is proper for OTU picking?

Thank you for your time.
Regards

Hi @Sky23,

I would say the elevated region should be the region on which the sequences overlap, and so the the quality values are strengthened by the scores in both reads.

On deblur, as long as I know it does not consider quality scores, so it should avoid this quality issue!

The chimera filter is a valid point in here!
On the database, Silva is a good start as long as you know it includes all/most of the species in your samples. Do you have a mock communities included in the experiment to validate the pipeline?

If you opt for the closed-OTU picking, go with the whole sequences, then look at the results and see if many sequences are out of any clusters. If so, you may try a denovo-clustering with trained classifier!
Hope it make sense

Luca

Hi Luca,
Thank you for your time.
I do not have any mock commmunities included in the experiment.
My samples are V3-V4 amplicon generated using 341F/806R primers in human microbiota.
How can i get a good quality mock community for my study? Is there any guideline?
I was looking for it and was not sure how to get one for my study.
Should i remove mock community data from my qiime2 analysis after I am done.
Any advice will be appreciated.

Thank you

Hi,
yes form your region it would make sense that the elevated region you are seeing is the overlapping region.

For the mock, you may look at:
msystems.asm.org/content/1/5/e00062-16

But there are few commercially available now, if you google you can find easily.

When I am dealing with controls (mock communities as well as negative sample), I usually perform the diversity analysis and taxonomy assignment including them, to see how they behave; then I exclude them and repeat the diversity analysis for the final data.

Luca

1 Like

Thank you for the suggestions.
I checked the mock in the github, all of them are for V4 region with primer 515f-806r.
My sequencing data is for V3-V4 region with primers 341F/805R.
Can I still use this mock for analysis validation?

Hi,
one way may be to contact the submitter of the community and ask for the DNA, so you can amplify your region of interest. What you really need is the composition and the strain used in it so you can double check how your primer behave!

If they wont be able to support you on this, I suppose you have to go via commercial mock communities, which may provide DNA or even cell culture so you could test your extraction kit as well as the rest of the protocol.

Cheers

Thank you.
I am contacting ZymoBIOMICS for standards.
regards