vsearch clustering error when using vsearch cluster-features-de-novo

I am trying to cluster my samples @97% using vsearch cluster-features-de-novo but it's giving me an error.

I'm using qiime2 version 2019.10
my command is:
qiime vsearch cluster-features-de-novo
--i-sequences $projID-rep-seqs.qza
--i-table $projID-DADA2-table.qza
--p-perc-identity 0.97
--p-threads 75
--o-clustered-table $projID-RepTbl-clustered.qza
--o-clustered-sequences $projID-RepSeqs-clustered.qza
--verbose

here is the output:


Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: vsearch --cluster_size /tmp/tmpoenbe0b2 --id 0.97 --centroids /tmp/q2-DNAFASTAFormat-oh_ukntc --uc /tmp/tmprsr5qvfs --qmask none --xsize --threads 75

vsearch v2.7.0_linux_x86_64, 377.4GB RAM, 88 cores

Reading file /tmp/tmpoenbe0b2 100%
756563 nt in 3696 seqs, min 32, max 440, avg 205
minseqlength 32: 2 sequences discarded.
Sorting by abundance 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 2876 Size min 1, max 12, avg 1.3
Singletons: 2457, 66.5% of seqs, 85.4% of clusters
Traceback (most recent call last):
File "/home/AnalysisTools/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/q2cli/commands.py", line 328, in call
results = action(**arguments)
File "</home/AnalysisTools/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/decorator.py:decorator-gen-121>", line 2, in cluster_features_de_novo
File "/home/AnalysisTools/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/sdk/action.py", line 240, in bound_callable
output_types, provenance)
File "/home/AnalysisTools/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/sdk/action.py", line 383, in callable_executor
output_views = self._callable(**view_args)
File "/home/AnalysisTools/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/q2_vsearch/cluster_features.py", line 201, in cluster_features_de_novo
include_collapsed_metadata=False)
File "/home/AnalysisTools/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/biom/table.py", line 2589, in collapse
for part, table in self.partition(f, axis=axis):
File "/home/AnalysisTools/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/biom/table.py", line 2295, in partition
part = f(id
, md)
File "/home/AnalysisTools/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/q2_vsearch/cluster_features.py", line 100, in collapse_f
return id_to_centroid[id
]
KeyError: 'bc581ce18c66116833d712e4076a8b63'


Hi @sraza! The error is saying that you have an ID mismatch between your table and your seqs. You'll need to double-check that the feature IDs are identical between the two halves of the dataset. Where did these files come from? How were they generated?

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.