Error when renning cluster-features-de-novo

slsevilla · May 6, 2020, 5:27pm

I have run into the same error described above, and was wondering if there was any solution to this.

I ran the following command:

qiime vsearch cluster-features-de-novo --i-table /DCEG/Projects/Microbiome/Analysis/Project_NP0084_MB/20200410_2019.1/denoising/feature_tables/merged.qza --i-sequences /DCEG/Projects/Microbiome/Analysis/Project_NP0084_MB/20200410_2019.1/denoising/sequence_tables/merged.qza --p-perc-identity 0.99 --o-clustered-table /DCEG/Projects/Microbiome/Analysis/Project_NP0084_MB/20200505_classifiers/proj_data/output/table-dn-99.qza --o-clustered-sequences /DCEG/Projects/Microbiome/Analysis/Project_NP0084_MB/20200505_classifiers/proj_data/output/rep-seqs-dn-99.qza

I got the following error:

Plugin error from vsearch:

  '667289cef41d2b4b2aa0e47041e05074'

Debug info has been saved to /tmp/qiime2-q2cli-err-_mmvn6y9.log
(qiime2-2019.1) [sevillas2@node039 proj_data]$ ^C
(qiime2-2019.1) [sevillas2@node039 proj_data]$ cat /tmp/qiime2-q2cli-err-_mmvn6y9.log
Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: vsearch --cluster_size /tmp/tmpembv766e --id 0.99 --centroids /tmp/q2-DNAFASTAFormat-4j_oxuxr --uc /tmp/tmptt6v3ksf --qmask none --xsize --threads 1

vsearch v2.7.0_linux_x86_64, 126.1GB RAM, 32 cores
https://github.com/torognes/vsearch

Reading file /tmp/tmpembv766e 100%
456240 nt in 1811 seqs, min 51, max 418, avg 252
minseqlength 32: 7 sequences discarded.
Sorting by abundance 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 1446 Size min 1, max 6, avg 1.3
Singletons: 1206, 66.6% of seqs, 83.4% of clusters
Traceback (most recent call last):
  File "/DCEG/Resources/Tools/miniconda/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/q2cli/commands.py", line 274, in __call__
    results = action(**arguments)
  File "</DCEG/Resources/Tools/miniconda/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/decorator.py:decorator-gen-120>", line 2, in cluster_features_de_novo
  File "/DCEG/Resources/Tools/miniconda/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/sdk/action.py", line 231, in bound_callable
    output_types, provenance)
  File "/DCEG/Resources/Tools/miniconda/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/sdk/action.py", line 365, in _callable_executor_
    output_views = self._callable(**view_args)
  File "/DCEG/Resources/Tools/miniconda/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/q2_vsearch/_cluster_features.py", line 201, in cluster_features_de_novo
    include_collapsed_metadata=False)
  File "/DCEG/Resources/Tools/miniconda/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/biom/table.py", line 2589, in collapse
    for part, table in self.partition(f, axis=axis):
  File "/DCEG/Resources/Tools/miniconda/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/biom/table.py", line 2295, in partition
    part = f(id_, md)
  File "/DCEG/Resources/Tools/miniconda/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/q2_vsearch/_cluster_features.py", line 100, in collapse_f
    return id_to_centroid[id_]
KeyError: '667289cef41d2b4b2aa0e47041e05074'

I looked into the feature 667289cef41d2b4b2aa0e47041e05074
I found that it has <32 bp, as described above (it has 26).

So, I assume I need to filter out seq below this threshold, but was wondering what the cleanest way to do this was?

This data was generated using a MiSeq, and is a combination of human samples and artificially created communities.

merged_filtered.qzv (469.0 KB)

thermokarst · May 11, 2020, 9:25pm

Hi @slsevilla - I think you found a bug in q2-vsearch (but, I still need to dig into this a bit more to be sure).

To get around this, you can pre-filter you FeatureData[Sequence] to remove anything shorter than 32 nts long:

qiime feature-table filter-seqs \
    --i-data /DCEG/Projects/Microbiome/Analysis/Project_NP0084_MB/20200410_2019.1/denoising/sequence_tables/merged.qza \
    --m-metadata-file /DCEG/Projects/Microbiome/Analysis/Project_NP0084_MB/20200410_2019.1/denoising/sequence_tables/merged.qza \
    --p-where 'length(sequence) > 32' \
    --o-filtered-data filtered-seqs.qza

Then, you can filter those removed features out of your feature table:

qiime feature-table filter-features \
  --i-table /DCEG/Projects/Microbiome/Analysis/Project_NP0084_MB/20200410_2019.1/denoising/feature_tables/merged.qza
  --m-metadata-file filtered-seqs.qza
  --o-filtered-table filtered-table.qza

Then, stick filtered-seqs.qza and filtered-table.qza into the cluster-features-de-novo step above, and in theory you should be good to go (I hope). Please keep us posted. :qiime2:

slsevilla · May 14, 2020, 8:25pm

I was able to run it once these were filtered!!

Thank you!
-Samantha

system · July 3, 2020, 8:29pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.