Ive been running this command for vsearch for a lot of my analysis using different databases. Lately Ive been running through an issue with my BOLD database. When I run the consensus vsearch command using it it keeps giving me this error:
KeyError: 'Identifier 342 was reported in taxonomic search results, but was not present in the reference taxonomy.'
I looked for that identifier in both my fasta and taxanonomy files, but I cant find it. My files are both headerless and thus I imported them using:
i checked the number of identifiers in both files and they are similar. I checked for 342 couldnt find it anywhwere in the files, both, taxa and the fasta.
Here is a copy of the error message i keep receiving. I wold really appreciate it if u can offer me advice on how to proceed:
Reading file /store2/anan/tmp/qiime2-archive-j7aw57gj/4b0cf6df-445c-4121-aaef-0836a5e9a7df/data/dna-sequences.fasta 100%
937180679 nt in 1506031 seqs, min 55, max 2927, avg 622
Masking 100%
Counting k-mers 100%
Creating k-mer index 100%
Searching 100%
Matching query sequences: 229 of 10378 (2.21%)
Traceback (most recent call last):
File "/comp2/anan/Anaconda3/envs/qiime2-2018.4/lib/python3.5/site-packages/pandas/core/indexes/base.py", line 2566, in get_value
return libts.get_value_box(s, key)
File "pandas/_libs/tslib.pyx", line 1017, in pandas._libs.tslib.get_value_box
File "pandas/_libs/tslib.pyx", line 1025, in pandas._libs.tslib.get_value_box
TypeError: 'str' object cannot be interpreted as an integer
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/comp2/anan/Anaconda3/envs/qiime2-2018.4/lib/python3.5/site-packages/q2_feature_classifier/_consensus_assignment.py", line 104, in import_blast_format_assignments
t = ref_taxa[id]
File "/comp2/anan/Anaconda3/envs/qiime2-2018.4/lib/python3.5/site-packages/pandas/core/series.py", line 623, in getitem
result = self.index.get_value(self, key)
File "/comp2/anan/Anaconda3/envs/qiime2-2018.4/lib/python3.5/site-packages/pandas/core/indexes/base.py", line 2574, in get_value
raise e1
File "/comp2/anan/Anaconda3/envs/qiime2-2018.4/lib/python3.5/site-packages/pandas/core/indexes/base.py", line 2560, in get_value
tz=getattr(series.dtype, 'tz', None))
File "pandas/_libs/index.pyx", line 83, in pandas._libs.index.IndexEngine.get_value
File "pandas/_libs/index.pyx", line 91, in pandas._libs.index.IndexEngine.get_value
File "pandas/_libs/index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '342'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/comp2/anan/Anaconda3/envs/qiime2-2018.4/lib/python3.5/site-packages/q2cli/commands.py", line 274, in call
results = action(**arguments)
File "", line 2, in classify_consensus_vsearch
File "/comp2/anan/Anaconda3/envs/qiime2-2018.4/lib/python3.5/site-packages/qiime2/sdk/action.py", line 231, in bound_callable
output_types, provenance)
File "/comp2/anan/Anaconda3/envs/qiime2-2018.4/lib/python3.5/site-packages/qiime2/sdk/action.py", line 366, in callable_executor
output_views = self._callable(**view_args)
File "/comp2/anan/Anaconda3/envs/qiime2-2018.4/lib/python3.5/site-packages/q2_feature_classifier/_vsearch.py", line 35, in classify_consensus_vsearch
unassignable_label=unassignable_label)
File "/comp2/anan/Anaconda3/envs/qiime2-2018.4/lib/python3.5/site-packages/q2_feature_classifier/_consensus_assignment.py", line 29, in _consensus_assignments
output.name, ref_taxa, unassignable_label=unassignable_label)
File "/comp2/anan/Anaconda3/envs/qiime2-2018.4/lib/python3.5/site-packages/q2_feature_classifier/_consensus_assignment.py", line 109, in import_blast_format_assignments
'taxonomy.').format(str(id)))
KeyError: 'Identifier 342 was reported in taxonomic search results, but was not present in the reference taxonomy.'
I ran those commands and as far as I can see when compared to the templates in the Moving picture tutorial they are similar ad meet the requirements for reference database structure.
This one stumped me for a while because your files look fine to the naked eye, have the same number of entries, etc, and I don't have your query sequences to replicate the exact error you have.
But I believe I discovered the problem:
Your fasta file (but not your taxonomy file) contains invisible special characters (^M) at the end of the accession #s (this is a windows newline character). vsearch seems to be interpreting this newline character as part of the accession # and hence there is a mismatch and so much chaos. You can use something like dos2unix to convert your fasta file, and then everything should be okay.
Please give that a try and let me know if it works!