vsearch open reference otu plugin error

Hello,

I was running vsearch open reference otu which uses a database which we have curated using the following script:

declare -a arr=(“v2” “v3” “v4” “v67” “v8” “v9”)
REGION={arr[{SLURM_ARRAY_TASK_ID}-1]}
echo “$REGION”

intableqza=…/analysis/A04-workflow-v2/merged-$REGION-table.qza
inrepsqza=…/analysis/A04-workflow-v2/merged-$REGION-rep-seqs.qza

echo “$intableqza”
echo “$inrepsqza”

qiime vsearch cluster-features-open-reference
–i-table $intableqza
–i-sequences $inrepsqza
–i-reference-sequences /path/db_v4.0.fasta.qza
–p-perc-identity 0.99
–p-strand plus
–p-threads 4
–o-clustered-table $outDir/$REGION/otu-table.qza
–o-clustered-sequences $outDir/$REGION/otu-rep-seqs.qza
–o-new-reference-sequences $outDir/$REGION/otu-new-ref-seqs.qza

My array works for 4 of my 6 regions, however I get a similar error for region v2 and region v9 which states:
“Plugin error from vsearch:
‘db16c939e7d01eaf1a3554820a848d44’ (this is diff bw v2 and v9)
Debug info has been saved to /tmp/qiime2-q2cli-err-55fzxzyt.log”

This is the information that is contained in the log:
"vsearch v2.7.0_linux_x86_64, 125.5GB RAM, 24 cores

Reading file /tmp/qiime2-archive-hta8hld1/e2641e45-f8eb-4865-99fc-1e4f999cbc6f/data/dna-sequences.fasta 100%
105012606 nt in 72682 seqs, min 900, max 2200, avg 1445
Masking 100%
Counting k-mers 100%
Creating k-mer index 100%
Searching 100%
Matching query sequences: 1090 of 3099 (35.17%)
vsearch v2.7.0_linux_x86_64, 125.5GB RAM, 24 cores

Reading file /tmp/tmpo_m_cxfp 100%
440560 nt in 2009 seqs, min 21, max 310, avg 219
Getting sizes 100%
Sorting 100%
Median abundance: 27
Writing output 100%
vsearch v2.7.0_linux_x86_64, 125.5GB RAM, 24 cores

Reading file /tmp/tmpuq4x0ruu 100%
440539 nt in 2008 seqs, min 39, max 310, avg 219
minseqlength 32: 1 sequence discarded.
Sorting by abundance 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 1250 Size min 1, max 10, avg 1.6
Singletons: 862, 42.9% of seqs, 69.0% of clusters
Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: vsearch --usearch_global /tmp/tmpeikkpvjc --id 0.99 --db /tmp/qiime2-archive-hta8hld1/e2641e45-f8eb-4865-99fc-1e4f999cbc6f/data/dna-sequences.fasta --uc /tmp/tmpqlwer7vq --strand plus --qmask none --notmatched /tmp/tmpo_m_cxfp --threads 4

Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: vsearch --sortbysize /tmp/tmpo_m_cxfp --xsize --output /tmp/q2-DNAFASTAFormat-q8u2szxs

Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: vsearch --cluster_size /tmp/tmpuq4x0ruu --id 0.99 --centroids /tmp/q2-DNAFASTAFormat-fi1gqdmv --uc /tmp/tmpbsyzyr6g --qmask none --xsize --threads 4

Traceback (most recent call last):
File “/home-3/[email protected]/miniconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/q2cli/commands.py”, line 328, in call
results = action(**arguments)
File “</home-3/[email protected]/miniconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/decorator.py:decorator-gen-126>”, line 2, in cluster_features_open_reference
File “/home-3/[email protected]/miniconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/qiime2/sdk/action.py”, line 245, in bound_callable
output_types, provenance)
File “/home-3/[email protected]/miniconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/qiime2/sdk/action.py”, line 484, in callable_executor
outputs = self._callable(scope.ctx, **view_args)
File “/home-3/[email protected]/miniconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/q2_vsearch/_cluster_features.py”, line 338, in cluster_features_open_reference
perc_identity=perc_identity, threads=threads)
File “</home-3/[email protected]/miniconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/decorator.py:decorator-gen-479>”, line 2, in cluster_features_de_novo
File “/home-3/[email protected]/miniconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/qiime2/sdk/action.py”, line 245, in bound_callable
output_types, provenance)
File “/home-3/[email protected]/miniconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/qiime2/sdk/action.py”, line 390, in callable_executor
output_views = self._callable(**view_args)
File "/home-3/[email protected]/miniconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/q2_vsearch/cluster_features.py", line 201, in cluster_features_de_novo
include_collapsed_metadata=False)
File “/home-3/[email protected]/miniconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/biom/table.py”, line 2604, in collapse
for part, table in self.partition(f, axis=axis):
File “/home-3/[email protected]/miniconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/biom/table.py”, line 2310, in partition
part = f(id
, md)
File "/home-3/[email protected]/miniconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/q2_vsearch/cluster_features.py", line 100, in collapse_f
return id_to_centroid[id
]
KeyError: ‘db16c939e7d01eaf1a3554820a848d44’ "

Do you have suggestions for fixing this error?

Thank you!
Lauren

Hello Lauren,

This KeyError during vsearch clustering seems to happen when reads are filtered out because they are too short.

And in your error message, I also see mention of a read being removed due to length:

Reading file /tmp/tmpuq4x0ruu 100%
440539 nt in 2008 seqs, min 39, max 310, avg 219
minseqlength 32: 1 sequence discarded.

That could be the ASV labeled db16c939e7d01eaf1a3554820a848d44 which is missing later leading to that key error.

This great post by ebolyen has more details about this error. Can you double check on this ASV and see if it is < 32 bp long to verify that this is the source of the problem? Thanks!

Colin

4 Likes

Hi Colin,

Thanks so much! I wondered if this was the problem! Yes- the sequence was only about 20 bp. I am using the Ion Torrent sequencing platform, so some of the sequences are quite short.

Once I suspected it was short sequences that was causing the error, I tried pre-filtering my sequences to 100bp using seqtk prior to importing my data into qiime2. Unfortunately, it looks like seqtk missed a few sequences (not sure why…?), so I am still throwing an error.

BUT… I saw in the latest qiime2-2020.6 update, that I think there is no longer a 32bp requirement in Vsearch - is this correct?! I have already updated and am running vsearch as we speak to see what happens

Best,
Lauren

3 Likes

That's correct! Here's the PR to the q2-vsearch plugin that changes the default to --minseqlength 1.

... which means that if you still get this error, there might be a different underlying issue.

Keep us posted! :mailbox:
Colin

2 Likes

Hi Colin,

Just wanted to keep you posted that yes! With the qiime2-2020.6 update I no longer receive the vsearch error. :star_struck:

Thanks so much!
Lauren

4 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.