Plugin error with Deblur

olar785 · April 17, 2018, 1:44pm

Hi there,

I have imported joined reads to Qiime2 that are about 350 in length and I am trying to use deblur denoise-other.

qiime deblur denoise-other
–i-demultiplexed-seqs $Outputs/joined_demux.qza
–i-reference-seqs $Outputs/rdp_seq_1.qza
–p-trim-length 300
–p-min-reads 1
–p-min-size 1
–o-table $Outputs/deblur_table.qza
–o-representative-sequences $Outputs/rep_seq_deblur.qza
–o-stats $Outputs/deblur_stats.txt
–verbose

The process seems to run well but then, once all samples have been denoised, it returns the following error:

#############################################
Traceback (most recent call last):
File “/share/easybuild/RHEL6.3/sandybridge/software/QIIME2/2018.2/bin/deblur”, line 4, in
import(‘pkg_resources’).run_script(‘deblur==1.0.3’, ‘deblur’)
File “/share/easybuild/RHEL6.3/sandybridge/software/QIIME2/2018.2/lib/python3.5/site-packages/pkg_resources/init.py”, line 750, in run_script
self.require(requires)[0].run_script(script_name, ns)
File “/share/easybuild/RHEL6.3/sandybridge/software/QIIME2/2018.2/lib/python3.5/site-packages/pkg_resources/init.py”, line 1527, in run_script
exec(code, namespace, namespace)
File “/gpfs1m/apps/easybuild/RHEL6.3/sandybridge/software/QIIME2/2018.2/lib/python3.5/site-packages/deblur-1.0.3-py3.5.egg-info/scripts/deblur”, line 684, in
deblur_cmds()
File “/share/easybuild/RHEL6.3/sandybridge/software/QIIME2/2018.2/lib/python3.5/site-packages/click/core.py”, line 722, in call
return self.main(*args, **kwargs)
File “/share/easybuild/RHEL6.3/sandybridge/software/QIIME2/2018.2/lib/python3.5/site-packages/click/core.py”, line 697, in main
rv = self.invoke(ctx)
File “/share/easybuild/RHEL6.3/sandybridge/software/QIIME2/2018.2/lib/python3.5/site-packages/click/core.py”, line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File “/share/easybuild/RHEL6.3/sandybridge/software/QIIME2/2018.2/lib/python3.5/site-packages/click/core.py”, line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File “/share/easybuild/RHEL6.3/sandybridge/software/QIIME2/2018.2/lib/python3.5/site-packages/click/core.py”, line 535, in invoke
return callback(*args, **kwargs)
File “/gpfs1m/apps/easybuild/RHEL6.3/sandybridge/software/QIIME2/2018.2/lib/python3.5/site-packages/deblur-1.0.3-py3.5.egg-info/scripts/deblur”, line 664, in workflow
threads=threads_per_sample)
File “/share/easybuild/RHEL6.3/sandybridge/software/QIIME2/2018.2/lib/python3.5/site-packages/deblur/workflow.py”, line 320, in remove_artifacts_from_biom_table
coverage_thresh=coverage_thresh)
File “/share/easybuild/RHEL6.3/sandybridge/software/QIIME2/2018.2/lib/python3.5/site-packages/deblur/workflow.py”, line 467, in remove_artifacts_seqs
if (float(line[2]) >= sim_thresh) and
IndexError: list index out of range
Plugin error from deblur:

Command ‘[‘deblur’, ‘workflow’, ‘–seqs-fp’, ‘/tmp/jobs/uvon315/67062463/qiime2-archive-5isxyeh2/4557afd4-42d0-4f91-958f-c2c7b7b517ca/data’, ‘–output-dir’, ‘/tmp/jobs/uvon315/67062463/tmp7mpeku86’, ‘–mean-error’, ‘0.005’, ‘–indel-prob’, ‘0.01’, ‘–indel-max’, ‘3’, ‘–trim-length’, ‘300’, ‘–min-reads’, ‘1’, ‘–min-size’, ‘1’, ‘–jobs-to-start’, ‘1’, ‘-w’, ‘–pos-ref-fp’, ‘/tmp/jobs/uvon315/67062463/qiime2-archive-uy_zobfd/3d260876-4838-4c28-80d3-cc4f876df761/data/dna-sequences.fasta’]’ returned non-zero exit status 1
#############################################

I ran the qiime validation tool on my joined-read and it returns this: Artifact joined_demux.qza appears to be valid at level=max. I am running QIIME2 version 2018.2

I don’t know what can be wrong here…the error really happen after all samples have been processed, according to the deblur.log. Here is the tail of the file:

INFO(47286889311616)2018-04-17 18:57:34,414:finished processing per sample fasta files
INFO(47286889311616)2018-04-17 18:57:34,417:create_otu_table for 100 samples, into output table /tmp/jobs/uvon315/67062463/tmp7mpeku86/all.biom
INFO(47286889311616)2018-04-17 18:57:41,108:for output biom table loaded 100 samples, 95167 unique sequences
INFO(47286889311616)2018-04-17 18:57:41,221:keeping 95167 (out of 95167 sequences) with >=1 reads
INFO(47286889311616)2018-04-17 18:57:52,782:saved to biom file /tmp/jobs/uvon315/67062463/tmp7mpeku86/all.biom
INFO(47286889311616)2018-04-17 18:57:53,107:saved sequence fasta file to /tmp/jobs/uvon315/67062463/tmp7mpeku86/all.seqs.fa
INFO(47286889311616)2018-04-17 18:57:53,217:getting 16s sequences from the biom table
INFO(47286889311616)2018-04-17 18:57:53,218:remove_artifacts_seqs file /tmp/jobs/uvon315/67062463/tmp7mpeku86/all.seqs.fa

Thank you for your help

wasade · April 18, 2018, 5:39pm

Hi @olar785,

Would you be able to send the output from the following two commands please?

$ sortmerna --version
$ indexdb_rna -h | head

Best,
Daniel

olar785 · April 18, 2018, 9:21pm

Hi Daniel,

Here is the output:

SortMeRNA version 2.0, 29/11/2014

Program: SortMeRNA version 2.0, 29/11/2014
Copyright: 2012-2015 Bonsai Bioinformatics Research Group:
LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
OTU-picking extensions and continuing support developed in the Knight Lab,
BioFrontiers Institute, University of Colorado at Boulder
Disclaimer: SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU Lesser General Public License for more details.
Contact: Evguenia Kopylova, [email protected]

wasade · April 19, 2018, 4:23pm

Thank you. Would it be possible to send or post the $Outputs/rdp_seq_1.qza file so I can examine it?

To explain a little more, what the error suggests is that something went wrong when filtering using SortMeRNA, and more specifically that the output does not appear to be well formed blast-like output which is surprising. The most likely causes are either the versions of the binaries (but we’ve checked that off), or the input reference. It’s possible the issue is with your input data, but those were successfully processed through the other stages of Deblur so are less likely to contain a problem.

Would it be possible to share the input reference so I can see if anything stands out? If the file is too large, sending say the first thousand lines would be fine too. An example is below doing the extraction of the qza, getting a thousand lines, and compressing to reduce size. Note the paths produced by extract are going to depend on the artifact and your current working directory

$ qiime tools extract input.qza
Extracted to /some/path/on/your/system
$ # head -n 1000 /some/path/on/your/system/data/dna-sequences.fasta > to_daniel.fasta
$ gzip to_daniel.fasta

Best,
Daniel

wasade · April 19, 2018, 4:24pm

One other comment, you may want to retain the default for –p-min-reads. Using a value of 1 means that singletons will be retained. I don’t think that is the problem but just as a heads up.

olar785 · April 20, 2018, 1:00am

Thanks Daniel,

Here's the file

to_daniel.fasta.gz (40.0 KB)

olar785 · April 20, 2018, 1:00am

Thanks Daniel,

I know rare sequences are likely to be artefacts but I realized tha Deblur removes those at the end of the process and since Deblur takes so long (about 2 days in that case), I would rather keep all information I can and decide on a threshold afterward

wym199633 · April 23, 2018, 2:16pm

Hi everyone,
It seems I met a very similar error. I used
qiime deblur denoise-16S --i-demultiplexed-seqs demux-joined-filtered.qza --p-trim-length 400 --o-representative-sequences rep-seqs.qza --o-table table.qza --p-sample-stats --o-stats deblur-stats.qza
–p-min-reads 1 --p-min-size 1
After more than two-day running, it gave me a error message with the ending “returned non-zero exit status -8”
I can just remember this. I cannot find the .log file, because the terminal is closed. Looking forward to the suggestion!

Oliver

wasade · April 24, 2018, 3:10am

Sorry for the delay! Was out all weekend

I made a typo in my earlier comment: --p-min-size is for the cluster size to determine putative sequence. Are you sure this is what you want? Setting --p-min-reads to 1 is what we do in Qiita for instance so that we can recover low abundant sequences, but setting --p-min-size to 1 I don’t think is what you want to do here and could also be why the run time is so high.

I just grabbed your data, and the only thing that stands out to me is that some of the sequences contain Ns but that should not present a problem for Deblur. Can you share the $Outputs/rdp_seq_1.qza file by chance?

At this point, the best guess I have is that --p-min-size of 1 is somehow introducing this issue (@wym199633 thank you for the note on this thread!). Would it be possible to run using the default for --p-min-size?

Best,
Daniel

wym199633 · April 24, 2018, 6:28pm

You mean that set
--p-min-size 2 --p-min-reads 1
will solve the problem? I will have a try on this.
Just for clarify, the reason I want to retain all low-count sequences is that I want to put the OTU table into the DESeq2, and the DESeq2 do not want me to delete any low count sequences, for the best use of their statistical models.

To make sure, --p-min-size 2 will delete the sequence count 0 and 1 in each sample, right? If so, set the parameter to 1 or 0 are the same I guess?

Oliver

wasade · April 24, 2018, 7:54pm

Hi @wym199633,

I’m advising to use the default for --p-min-size. --p-min-reads may make sense to modify given your application. If your goal is to retain singletons after the execution of Deblur, then --p-min-reads is the parameter that should be modified.

Best,
Daniel

olar785 · April 24, 2018, 8:48pm

Thanks Daniel and sorry for all the trouble.
I have good news, I could finally make it work. The only thing is I’m not 100% sure of what did it. This time I used the default values but I had tried deblur before (with the same data) with all the default values (including --p-min-reads 10) and it ended up crashing right at the end. Of course I had much less remaining features (probably around 5,000 instead of the 95,167) but still the same problem.

I suspect that the problem may have originated from duplicate IDs and/or duplicate sequences in the reference database I used. It’s a combination of MIDORI and BOLD COI sequences and I found out that I had not prepared it properly. I would have thought that a warning message would have raised when importing my database in Qiime2 though.

Anyway, this is solved for me. Sorry again for not having a clear idea of what the issue might have been.
Cheers