I am having an issue with the command vsearch cluster-features-de-novo. Here's the workflow:
I have multiple runs for a project and want to merge the two runs together for diversity analysis. Both runs were independently run through DADA2 for denoising. A couple of samples were in both runs because they didn't work well the first time. After DADA2, I filtered the table to remove the duplicate samples and merged the tables. Now I want to cluster at 99% because I am getting multiple features for the same species when I assign taxonomy. The command works fine if I run on my original table and rep-seqs file. However, if I filter the table, the command returns an error when running on the resulting filtered table and sequence file. This error occurs even on the unmerged, but filtered tables and sequence files.
Conversely, the command runs fine if I merge the two files together without filtering, so it's definitely something to do with the filtering step. Below is the command and error output:
qiime vsearch cluster-features-de-novo --i-table Run6/18S/qiime2/rem-neg-table.qza --i-sequences Run6/18S/qiime2/rem-neg-seqs.qza --p-perc-identity 0.99 --o-clustered-table test/Run6-neg-table.qza --o-clustered-sequences test/Run6-neg-table.qza --verbose
Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.
Command: vsearch --cluster_size /var/folders/m3/q8nmfgm910dg75w0k0my0t6m0000gr/T/tmpkcf1496q --id 0.99 --centroids /var/folders/m3/q8nmfgm910dg75w0k0my0t6m0000gr/T/q2-DNAFASTAFormat-07y20m6j --uc /var/folders/m3/q8nmfgm910dg75w0k0my0t6m0000gr/T/tmpiyfmssxi --qmask none --xsize --threads 1
vsearch v2.7.0_macos_x86_64, 16.0GB RAM, 8 cores
Reading file /var/folders/m3/q8nmfgm910dg75w0k0my0t6m0000gr/T/tmpkcf1496q 6%
Fatal error: Invalid (zero) abundance annotation in FASTA file header
Traceback (most recent call last):
File "/Users/genlab/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/q2cli/commands.py", line 311, in call
results = action(**arguments)
File "</Users/genlab/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/decorator.py:decorator-gen-121>", line 2, in cluster_features_de_novo
File "/Users/genlab/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/qiime2/sdk/action.py", line 231, in bound_callable
output_types, provenance)
File "/Users/genlab/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/qiime2/sdk/action.py", line 365, in callable_executor
output_views = self._callable(**view_args)
File "/Users/genlab/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/q2_vsearch/_cluster_features.py", line 193, in cluster_features_de_novo
run_command(cmd)
File "/Users/genlab/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/q2_vsearch/_cluster_features.py", line 33, in run_command
subprocess.run(cmd, check=True)
File "/Users/genlab/miniconda3/envs/qiime2-2019.4/lib/python3.6/subprocess.py", line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['vsearch', '--cluster_size', '/var/folders/m3/q8nmfgm910dg75w0k0my0t6m0000gr/T/tmpkcf1496q', '--id', '0.99', '--centroids', '/var/folders/m3/q8nmfgm910dg75w0k0my0t6m0000gr/T/q2-DNAFASTAFormat-07y20m6j', '--uc', '/var/folders/m3/q8nmfgm910dg75w0k0my0t6m0000gr/T/tmpiyfmssxi', '--qmask', 'none', '--xsize', '--threads', '1']' returned non-zero exit status 1.
Plugin error from vsearch:
Command '['vsearch', '--cluster_size', '/var/folders/m3/q8nmfgm910dg75w0k0my0t6m0000gr/T/tmpkcf1496q', '--id', '0.99', '--centroids', '/var/folders/m3/q8nmfgm910dg75w0k0my0t6m0000gr/T/q2-DNAFASTAFormat-07y20m6j', '--uc', '/var/folders/m3/q8nmfgm910dg75w0k0my0t6m0000gr/T/tmpiyfmssxi', '--qmask', 'none', '--xsize', '--threads', '1']' returned non-zero exit status 1.
See above for debug info.
Concerning the error (Invalid (zero) abundance annotation in FASTA file header), I exported both the filtered and non-filtered FASTA files and they look the same to me. I have attached them both.
Also, I generated the filtered sequences using the command feature-table filter-seqs.
filtered-seqs.qza (147.1 KB) rep-seqs.qza (137.8 KB)