vsearch clustering error

Heather_E · July 3, 2019, 7:08pm

I am having an issue with the command vsearch cluster-features-de-novo. Here's the workflow:

I have multiple runs for a project and want to merge the two runs together for diversity analysis. Both runs were independently run through DADA2 for denoising. A couple of samples were in both runs because they didn't work well the first time. After DADA2, I filtered the table to remove the duplicate samples and merged the tables. Now I want to cluster at 99% because I am getting multiple features for the same species when I assign taxonomy. The command works fine if I run on my original table and rep-seqs file. However, if I filter the table, the command returns an error when running on the resulting filtered table and sequence file. This error occurs even on the unmerged, but filtered tables and sequence files.
Conversely, the command runs fine if I merge the two files together without filtering, so it's definitely something to do with the filtering step. Below is the command and error output:

qiime vsearch cluster-features-de-novo --i-table Run6/18S/qiime2/rem-neg-table.qza --i-sequences Run6/18S/qiime2/rem-neg-seqs.qza --p-perc-identity 0.99 --o-clustered-table test/Run6-neg-table.qza --o-clustered-sequences test/Run6-neg-table.qza --verbose
Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: vsearch --cluster_size /var/folders/m3/q8nmfgm910dg75w0k0my0t6m0000gr/T/tmpkcf1496q --id 0.99 --centroids /var/folders/m3/q8nmfgm910dg75w0k0my0t6m0000gr/T/q2-DNAFASTAFormat-07y20m6j --uc /var/folders/m3/q8nmfgm910dg75w0k0my0t6m0000gr/T/tmpiyfmssxi --qmask none --xsize --threads 1

vsearch v2.7.0_macos_x86_64, 16.0GB RAM, 8 cores

Reading file /var/folders/m3/q8nmfgm910dg75w0k0my0t6m0000gr/T/tmpkcf1496q 6%

Fatal error: Invalid (zero) abundance annotation in FASTA file header
Traceback (most recent call last):
File "/Users/genlab/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/q2cli/commands.py", line 311, in call
results = action(**arguments)
File "</Users/genlab/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/decorator.py:decorator-gen-121>", line 2, in cluster_features_de_novo
File "/Users/genlab/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/qiime2/sdk/action.py", line 231, in bound_callable
output_types, provenance)
File "/Users/genlab/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/qiime2/sdk/action.py", line 365, in callable_executor
output_views = self._callable(**view_args)
File "/Users/genlab/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/q2_vsearch/_cluster_features.py", line 193, in cluster_features_de_novo
run_command(cmd)
File "/Users/genlab/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/q2_vsearch/_cluster_features.py", line 33, in run_command
subprocess.run(cmd, check=True)
File "/Users/genlab/miniconda3/envs/qiime2-2019.4/lib/python3.6/subprocess.py", line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['vsearch', '--cluster_size', '/var/folders/m3/q8nmfgm910dg75w0k0my0t6m0000gr/T/tmpkcf1496q', '--id', '0.99', '--centroids', '/var/folders/m3/q8nmfgm910dg75w0k0my0t6m0000gr/T/q2-DNAFASTAFormat-07y20m6j', '--uc', '/var/folders/m3/q8nmfgm910dg75w0k0my0t6m0000gr/T/tmpiyfmssxi', '--qmask', 'none', '--xsize', '--threads', '1']' returned non-zero exit status 1.

Plugin error from vsearch:

Command '['vsearch', '--cluster_size', '/var/folders/m3/q8nmfgm910dg75w0k0my0t6m0000gr/T/tmpkcf1496q', '--id', '0.99', '--centroids', '/var/folders/m3/q8nmfgm910dg75w0k0my0t6m0000gr/T/q2-DNAFASTAFormat-07y20m6j', '--uc', '/var/folders/m3/q8nmfgm910dg75w0k0my0t6m0000gr/T/tmpiyfmssxi', '--qmask', 'none', '--xsize', '--threads', '1']' returned non-zero exit status 1.

See above for debug info.

Concerning the error (Invalid (zero) abundance annotation in FASTA file header), I exported both the filtered and non-filtered FASTA files and they look the same to me. I have attached them both.

Also, I generated the filtered sequences using the command feature-table filter-seqs.
filtered-seqs.qza (147.1 KB) rep-seqs.qza (137.8 KB)

thermokarst · July 9, 2019, 12:49pm

Hi there @Heather_E!

I am not sure if I follow the rationale here.

How did you filter the feature table? According the provenance plots you shared, it looks like that was done outside of QIIME 2 and imported in.

Heather_E · July 11, 2019, 2:48pm

I exported the original table as a biom, then converted to tsv. Removed duplicate samples and negative controls using an R script, then converted back to biom and imported as qza.

thermokarst · July 11, 2019, 2:56pm

Well, I am not a vsearch developer, but, that error message sounds to me like you have features in your FeatureData[Sequence] that are either not present in your FeatureTable[Frequency], or, if they are there, they have no associated counts with them (zero sum across all samples). Perhaps you should take your FeatureTable[Frequency] and, using QIIME 2, filter out any zero-abundance reads. Once you have those removed, filter your FeatureData[Sequence] with the final feature table.

Heather_E · July 16, 2019, 6:33pm

That might be it. Especially because the negative samples were effectively brought down to 0 for features in that sample after running the R script. I will try filtering out the zero abundance reads and see what happens. Thank you!

Heather_E · July 22, 2019, 6:13pm

FYI, it worked! Thanks for your help.

Nicholas_Bokulich · September 12, 2019, 2:22pm

A post was split to a new topic: is there a method for filtering out duplicate samples?