Correct way to import SILVA 13.8.1 database into qiime2

I'm am using qiime2-2022.2 and have denoised my samples using DADA2. I did not truncate my seqs and wanted to use the latest SILVA database (13.8.1) with full length seqs (not NR99). The format of the database has completely changes since 13.2 so I am at a loss on which files to choose and how to import the database correctly. I tried to import it both ways (Headerless and with Header) but when I run the qiime feature-classifier classify-consensus-vsearch it is giving me the following error:


Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: vsearch --usearch_global /tmp/qiime2-archive-rptvji15/71a58092-ac9f-47b0-bbaa-69992417d645/data/dna-sequences.fasta --id 0.8 --query_cov 0.8 --strand both --maxaccepts 0 --maxrejects 0 --db /tmp/qiime2-archive-89ol21_6/7e3805db-e931-4885-aeb5-e9d8cde1bb4b/data/dna-sequences.fasta --threads 78 --output_no_hits --blast6out /tmp/tmph173fw6e

vsearch v2.7.0_linux_x86_64, 377.4GB RAM, 88 cores

Reading file /tmp/qiime2-archive-89ol21_6/7e3805db-e931-4885-aeb5-e9d8cde1bb4b/data/dna-sequences.fasta 100%
3183581141 nt in 2224740 seqs, min 900, max 4000, avg 1431
Masking 100%
Counting k-mers 100%
Creating k-mer index 100%
Searching 100%
Matching query sequences: 3303 of 3303 (100.00%)
Traceback (most recent call last):
File "/home/AnalysisTools/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3081, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'HL282720.7.1469'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/AnalysisTools/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/q2_feature_classifier/_consensus_assignment.py", line 105, in import_blast_format_assignments
t = ref_taxa[id
]
File "/home/AnalysisTools/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/pandas/core/series.py", line 853, in getitem
return self._get_value(key)
File "/home/AnalysisTools/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/pandas/core/series.py", line 961, in _get_value
loc = self.index.get_loc(label)
File "/home/AnalysisTools/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3083, in get_loc
raise KeyError(key) from err
KeyError: 'HL282720.7.1469'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/AnalysisTools/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/q2cli/commands.py", line 339, in call
results = action(**arguments)
File "", line 2, in classify_consensus_vsearch
File "/home/AnalysisTools/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 245, in bound_callable
outputs = self.callable_executor(scope, callable_args,
File "/home/AnalysisTools/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 391, in callable_executor
output_views = self._callable(**view_args)
File "/home/AnalysisTools/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/q2_feature_classifier/_vsearch.py", line 62, in classify_consensus_vsearch
consensus = _consensus_assignments(
File "/home/AnalysisTools/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/q2_feature_classifier/_consensus_assignment.py", line 29, in _consensus_assignments
obs_taxa = _import_blast_format_assignments(
File "/home/AnalysisTools/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/q2_feature_classifier/_consensus_assignment.py", line 107, in _import_blast_format_assignments
raise KeyError((
KeyError: 'Identifier HL282720.7.1469 was reported in taxonomic search results, but was not present in the reference taxonomy.'

Plugin error from feature-classifier:

'Identifier HL282720.7.1469 was reported in taxonomic search results, but was not present in the reference taxonomy.'

See above for debug info.


I saw a similar topic on the forum back in 2018 but that was with a custom database. This is not a custom database but clearly I'm doing something wrong when importing the database as the format has completely changed.

Please help.. I'm totally lost .. I've tried everything including clustering @97% and also filtering out my representative seqs from the DADA2 denoising and running the feature-classifier-vsearch with filtered seqs but still the same error.

Hi @sraza,

You can download pre-made SILVA classifiers from the Data resources page.

Alternatively, you can install and use the RESCRIPt plugin to make your own SILVA reference database. We even have a great tutorial for it here.

-Mike

2 Likes

silva-138-99-tax.qza (6.5 MB)
Hi @SoilRotifer,
Thanks so much for this information.
I downloaded the pre-made QZA file (silva-138-99-tax.qza & silva-138-99-seqs.qza) and I also made my own reference database as per the instructions (RESCRPt plugin) after importing the taxa file I unzipped it to check if that specific Taxon ID (HL282720.7.1469) is present in the taxa file. Its present in the taxmap file (taxmap_slv_ssu_ref_138.1.txt but NOT present in the pre-made taxa.qza (silva-138-99-tax.qza NOR in the one I made [silva-138-ref-tax.qza])
Please see the attached file.
TAXA_QZA..txt (4.6 KB)

Now sure what I'm doing wrong ?

Hi @sraza, you are not doing anything wrong.

To make the pre-made files, we basically followed the procedure outlined in the RESCRIPt tutorial. If you made the database yourself, using either the same or different parameters, it is quite possible that one of the filtering steps has removed that sequence.

To sanity check, you can simply look for the sequence ID of the file that you imported prior to performing any of the filtering / quality control steps.

Also, the taxmap file contains all of the taxa, whereas the NR99 files are clustered and manually curated as outlined here. So, there may not be complete parity with between the taxmap and the NR99 files, as the NR99 would essentially be a curated subset of the full SILVA database.

Also note, the pre-made files were made from SILVA version 138 (we need to update that), whereas the current download links for RESCRIPt point to 138.1.

An off-topic reply has been split into a new topic: vsearch classification error

Please keep replies on-topic in the future.