Error in isBimeraDenovoTable(unqs[[i]], ..., verbose = verbose) : Input must be a valid sequence table.

kreetelyll · August 12, 2019, 12:50pm

Hello!

I am having problem with dada2 denoise-paired: Denoise and dereplicate paired-end sequences step.
Command I am running:
qiime dada2 denoise-paired --i-demultiplexed-seqs demux-paired-end.qza --p-trunc-len-f 110 --p-trunc-len-r 100 --o-representative-sequences rep-seqs-dada2.qza --o-table table-dada2.qza --o-denoising-stats stats-dada2.qza

It gives me the following error:

R version 3.4.1 (2017-06-30)
Loading required package: Rcpp
DADA2 R package version: 1.6.0

Filtering ......
Learning Error Rates
2a) Forward Reads
Initializing error rates to maximum possible estimate.
Sample 1 - 214 reads in 154 unique sequences.
Sample 2 - 368244 reads in 66786 unique sequences.
Sample 3 - 1048669 reads in 194970 unique sequences.
selfConsist step 2
selfConsist step 3
selfConsist step 4
selfConsist step 5
selfConsist step 6
Convergence after 6 rounds.
2b) Reverse Reads
Initializing error rates to maximum possible estimate.
Sample 1 - 214 reads in 145 unique sequences.
Sample 2 - 368244 reads in 80528 unique sequences.
Sample 3 - 1048669 reads in 163768 unique sequences.
selfConsist step 2
selfConsist step 3
selfConsist step 4
selfConsist step 5
selfConsist step 6
Convergence after 6 rounds.
Denoise remaining samples ...
Remove chimeras (method = consensus)
Error in isBimeraDenovoTable(unqs[[i]], ..., verbose = verbose) :
Input must be a valid sequence table.
Calls: removeBimeraDenovo -> isBimeraDenovoTable
In addition: Warning message:
In is.na(colnames(unqs[[i]])) :
is.na() applied to non-(list or vector) of type 'NULL'
Execution halted
Traceback (most recent call last):
File "/storage/software/python/3.6-QIIME2-2019.1/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 231, in denoise_paired
run_commands([cmd])
File "/storage/software/python/3.6-QIIME2-2019.1/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 36, in run_commands
subprocess.run(cmd, check=True)
File "/storage/software/python/3.6-QIIME2-2019.1/lib/python3.6/subprocess.py", line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['run_dada_paired.R', '/tmp/tmpixswe83q/forward', '/tmp/tmpixswe83q/reverse', '/tmp/tmpixswe83q/output.tsv.biom', '/tmp/tmpixswe83q/track.tsv', '/tmp/tmpixswe83q/filt_f', '/tmp/tmpixswe83q/filt_r', '110', '100', '0', '0', '2.0', '2', 'consensus', '1.0', '1', '1000000']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/storage/software/python/3.6-QIIME2-2019.1/lib/python3.6/site-packages/q2cli/commands.py", line 274, in call
results = action(**arguments)
File "</storage/software/python/3.6-QIIME2-2019.1/lib/python3.6/site-packages/decorator.py:decorator-gen-442>", line 2, in denoise_paired
File "/storage/software/python/3.6-QIIME2-2019.1/lib/python3.6/site-packages/qiime2/sdk/action.py", line 231, in bound_callable
output_types, provenance)
File "/storage/software/python/3.6-QIIME2-2019.1/lib/python3.6/site-packages/qiime2/sdk/action.py", line 365, in callable_executor
output_views = self._callable(**view_args)
File "/storage/software/python/3.6-QIIME2-2019.1/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 246, in denoise_paired
" and stderr to learn more." % e.returncode)
Exception: An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

Is the problem that there are no more overlapping sites after truncating the sequences at 100 and 110? And should I maybe then just use only forward sequences?

Mehrbod_Estaki · August 12, 2019, 6:09pm

Hi @kreetelyll,

That could very well be it! What is the expected amplicon size and overlapping region?

That would certainly help to troubleshoot if insufficient overlap is an issue. I'd say try it!

Additionally, has there been any pre-processing done to these reads prior to dada2? I ask because your quality plots look a bit too clean cut, I see most of the medians all at the same value across the length of the reads, this usually occurs when some filtering has been done.

kreetelyll · August 21, 2019, 1:41pm

Thanks for the answer. I actually do not know about the overlapping region, how can I determine that or is it something that I should ask from the company who did the sequencing? Sorry, I am very new to this.
As for the pre-processing nothing was done to the data prior to dada2, I just imported the data and then continued on with dada2.

Mehrbod_Estaki · August 21, 2019, 11:10pm

Hi @kreetelyll,

Which region/primers were used to create your amplicons? Looking up your primer pair in the literature should tell you the expected amplicon size. Your facility might certainly know this too if they were the one who did the sample prep.
For example, the Moving Pictures tutorial uses the 515F/806R primer pair for 16S rRNA gene sequences. This gives an amplicon size of ~806-515=291 bp. In this scenario if you were running 2x150 bps Illumina runs you would have 300-291=9 bp overlap, assuming you did not truncate at all. Any trimming from the 3' end would have to be substracted from the overlap region. We also know dada2 requires a min 20bp overlap region to properly merge reads. If merging is not possible based on those requirements your best bet is to discard your reverse reads and use your forward reads only.

kreetelyll · August 22, 2019, 7:07am

I am using the same primer pair 515F/806R. I think that I will then just continue on with forward reads, I did a testrun with few samples:
qiime dada2 denoise-single --i-demultiplexed-seqs import-single-end.qza --p-trunc-len 122 --o-representative-sequences dada2-rep-seqs.qza --o-table dada2-table.qza --o-denoising-stats dada2-stats.qza

and this time it worked and came back without any errors. However I am not entirely sure how to determine where to truncate it exactly. Here I did it at 122 is that okay or what is the best way to decide that? I also read about a method called FIGARO that can help with that. Do you have any knowledge about usage of FIGARO and would recommend using it? https://www.biorxiv.org/content/10.1101/610394v1

Mehrbod_Estaki · August 22, 2019, 6:44pm

Hi @kreetelyll,
Thanks for clarifying!
I think you are making the right choice by using the forward reads only, give that you only have a 150bp run.
As for the truncating parameter, 122 sounds very reasonable. Given that the median score in your reads is above 35 even almost all the way through you could probably increase that value to something higher as well. The difference between 122 and say 140 example would be minimal with the former retaining probably a small amount of more reads at the cost of slightly less resolution. I use probably and slightly carefully here because these are both dependent on the data itself. You can always run a few different parameters and compare to see what works best for you. This gives you a good in depth understanding of how parameter setting affets a run. That being said, like I mentioned, you could totally just carry on with your analysis there are much more important challenges coming your way and you don't want to obsess over 1 step

As for the tool you linked too, I had not heard of FIGARO, but it does look very promising and would make a great q-2 addition since we do get a tremendous amount of questions on the forum regarding how to pick these parameters. Having read their brief pre-print, I would have liked to see them benchmark and validate their output recommendations but that may just come with use. Though unfortunately this will not help your case since you are using forward primers only and FIGARO is built for decision making regarding paired-end results. Thanks for sharing though!

system · September 23, 2019, 12:44am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.