I am trying to reannotate an OTU file/rep sequences with greengenes annotations to use with PICRUST, using the vsearch cluster-features-closed-references. I keep getting the error
Fatal error: Invalid (zero) abundance annotation in FASTA file header
What might cause this error?
Hi @mbnalbright,
Could you tell us a little more about how you got to this stage?
Did you use qiime vsearch dereplicate-sequences before OTU picking? (if no, use that command first to dereplicate, then do OTU picking)
Could you possibly share your input files?
Could you please share the full error traceback? (either run your command with --verbose added to the command, or open the log filepath given along with the error you reported)
With a little more info, we can help troubleshoot!
Reading file /tmp/qiime2-archive-4rbmg1by/8f0d60d6-00c8-4a2c-ab1a-788746d54125/data/dna-sequences.fasta 100%
142290491 nt in 99322 seqs, min 1254, max 2353, avg 1433
Masking 100%
Counting k-mers 100%
Creating k-mer index 100%
Searching
Fatal error: Invalid (zero) abundance annotation in FASTA file header
Traceback (most recent call last):
File "/home/malbright/anaconda3/envs/qiime2-2018.2/lib/python3.5/site-packages/q2cli/commands.py", line 246, in call
results = action(**arguments)
File "", line 2, in cluster_features_closed_reference
File "/home/malbright/anaconda3/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/sdk/action.py", line 228, in bound_callable
output_types, provenance)
File "/home/malbright/anaconda3/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/sdk/action.py", line 363, in callable_executor
output_views = self._callable(**view_args)
File "/home/malbright/anaconda3/envs/qiime2-2018.2/lib/python3.5/site-packages/q2_vsearch/_cluster_features.py", line 256, in cluster_features_closed_reference
run_command(cmd)
File "/home/malbright/anaconda3/envs/qiime2-2018.2/lib/python3.5/site-packages/q2_vsearch/_cluster_features.py", line 33, in run_command
subprocess.run(cmd, check=True)
File "/home/malbright/anaconda3/envs/qiime2-2018.2/lib/python3.5/subprocess.py", line 398, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['vsearch', '--usearch_global', '/tmp/tmpdlo96307', '--id', '0.3', '--db', '/tmp/qiime2-archive-4rbmg1by/8f0d60d6-00c8-4a2c-ab1a-788746d54125/data/dna-sequences.fasta', '--uc', '/tmp/tmp6ld5fb6x', '--strand', 'plus', '--qmask', 'none', '--notmatched', '/tmp/tmpltx9moo9', '--threads', '0']' returned non-zero exit status 1
Could you try using qiime vsearch dereplicate-sequences before OTU picking?
It looks like this error is coming from vsearch, not QIIME 2 — my understanding is that these abundance annotations are a part of the expected format for OTU picking in vsearch. From the manual:
input fasta file should present abundance annotations (i.e. a pattern [;]size=integer[;]
in the fasta header).
This abundance annotation will be written in during dereplication — so it looks like that will still need to be done here, even though you are re-clustering a pre-clustered table/sequences.