Vsearch error when using dereplicate-sequences

Hi!

I’ve been following this tutorial for clustering sequences into OTUs, as my sequences had already been demultiplexed, adaptor trimmed, and quality filtered by the sequencing center in QIIME 1 and were provided to me in a single seqs.fna file. I was able to import the seqs.fna file using the instructions in the tutorial, but when I attempted to dereplicate them using the dereplicate-sequences command in vsearch as stated in the tutorial, I got the following error:

Plugin error from vsearch:

Command ‘[‘vsearch’, ‘–derep_fulllength’, ‘/tmp/qiime2-archive-bwnv22zk/8a65d83b-5463-4325-a958-eab7db1b8f43/data/seqs.fna’, ‘–output’, ‘/tmp/q2-DNAFASTAFormat-k4w8vbpi’, ‘–relabel_sha1’, ‘–relabel_keep’, ‘–uc’, ‘/tmp/tmpep3rovh_’, ‘–qmask’, ‘none’, ‘–xsize’]’ returned non-zero exit status -9

Debug info has been saved to /tmp/qiime2-q2cli-err-sxpo5wee.log

Any ideas on how to fix this?
Thanks!

Hi @sbrown,
Could you please provide:

  1. The full command that you are using
  2. The full error traceback. You can get this either by re-running the command with the --verbose added to the command, or by opening up the error log: /tmp/qiime2-q2cli-err-sxpo5wee.log

Thanks!

Hi @Nicholas_Bokulich,

This is the command that I used: qiime vsearch dereplicate-sequences --i-sequences seqs.qza --o-dereplicated-table table.qza --o-dereplicated-sequences rep-seqs.qza

And this is the error traceback:

Reading file /tmp/qiime2-archive-xbiizhc0/8a65d83b-5463-4325-a958-eab7db1b8f43/data/seqs.fna 84%Traceback (most recent call last):
File “/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/q2cli/commands.py”, line 224, in call
results = action(**arguments)
File “”, line 2, in dereplicate_sequences
File “/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 228, in bound_callable
output_types, provenance)
File “/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 363, in callable_executor
output_views = self._callable(**view_args)
File “/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/q2_vsearch/_cluster_sequences.py”, line 129, in dereplicate_sequences
run_command(cmd)
File “/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/q2_vsearch/_cluster_features.py”, line 33, in run_command
subprocess.run(cmd, check=True)
File “/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/subprocess.py”, line 398, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command ‘[‘vsearch’, ‘–derep_fulllength’, ‘/tmp/qiime2-archive-xbiizhc0/8a65d83b-5463-4325-a958-eab7db1b8f43/data/seqs.fna’, ‘–output’, ‘/tmp/q2-DNAFASTAFormat-9z9ornh9’, ‘–relabel_sha1’, ‘–relabel_keep’, ‘–uc’, ‘/tmp/tmpkgtjqv3d’, ‘–qmask’, ‘none’, ‘–xsize’]’ returned non-zero exit status -9

Plugin error from vsearch:

Command ‘[‘vsearch’, ‘–derep_fulllength’, ‘/tmp/qiime2-archive-xbiizhc0/8a65d83b-5463-4325-a958-eab7db1b8f43/data/seqs.fna’, ‘–output’, ‘/tmp/q2-DNAFASTAFormat-9z9ornh9’, ‘–relabel_sha1’, ‘–relabel_keep’, ‘–uc’, ‘/tmp/tmpkgtjqv3d’, ‘–qmask’, ‘none’, ‘–xsize’]’ returned non-zero exit status -9

See above for debug info.

Thanks!

Hi @sbrown,
This error seems to be caused by vsearch running out of memory. A few questions:

  1. How much memory do you have available on the system that you are running your analysis on?
  2. What version of QIIME2 are you running? (just to be sure)
  3. Is QIIME2 installed natively or are you using a virtual box/docker image?

I am not sure if there are any parameters you can use to impact memory usage by vsearch, but this is not usually a very memory-intensive step —the only solution may be to run on a system with more available memory.

Thanks!

I’m running QIIME2 2017.12 using VirtualBox. I had only allocated 2gb of memory to the virtual machine, and once I increased it to 7gb it was able to complete dereplication. Should this be enough memory for subsequent steps in QIIME2 (such as OTU picking)? Thank you so much for your help, @Nicholas_Bokulich

Great! Glad to hear that did the trick!

Probably but it all depends on the specific steps and characteristics of your data. Most Illumina datasets consisting of a single MiSeq run should run just fine for most steps.

The one step that might cause issues is during taxonomy classification or any other step that utilizes a reference sequence database (e.g., closed- and open-reference OTU picking). Smaller databases like Greengenes should pose no problems, but something larger like SILVA often causes memory issues. It's always worth just giving it a try though — if you run into problems (e.g., memory issues during taxonomy classification), there may already be a solution discussed on this form (e.g., for some steps parameter choices can mitigate memory issues).

Good luck!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.