importing reference database for taxonomic assignment

crw · January 5, 2024, 6:22am

Hello Qiime forum team!
I'm trying to import a locally stored, curated 16S reference database that is currently a fasta file (sears.refseq16S-v2.fa) to create a "FeatureData[Taxonomy]" qza file to use for taxonomic assignment with the "qiime feature-classifier classify-consensus-blast" command. However, I'm confused about how to properly accomplish this. Any suggestions are much appreciated!

Qiime version I'm running: qiime2-amplicon-2023.9

Code:
qiime tools import
--input-path data/db/custom.16S.db/sears.refseq16S-v2.fa
--input-format DNAFASTAFormat
--type 'FeatureData[Taxonomy]'
--output-path qiime/subsampled-100k/sears.refseq16S-v2.qza

Error:
"Traceback (most recent call last):
File "/usr/local/Caskroom/miniforge/base/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/q2cli/builtin/tools.py", line 267, in import_data
artifact = qiime2.sdk.Artifact.import_data(type, input_path,
File "/usr/local/Caskroom/miniforge/base/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/qiime2/sdk/result.py", line 329, in import_data
return cls.from_view(type, view, view_type, provenance_capture,
File "/usr/local/Caskroom/miniforge/base/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/qiime2/sdk/result.py", line 355, in _from_view
transformation = from_type.make_transformation(to_type,
File "/usr/local/Caskroom/miniforge/base/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/qiime2/core/transform.py", line 58, in make_transformation
raise Exception("No transformation from %r to %r" %
Exception: No transformation from <class 'q2_types.feature_data._format.DNAFASTAFormat'> to <class 'q2_types.feature_data._format.TSVTaxonomyDirectoryFormat'>

An unexpected error has occurred:
No transformation from <class 'q2_types.feature_data._format.DNAFASTAFormat'> to <class 'q2_types.feature_data._format.TSVTaxonomyDirectoryFormat'>
See above for debug info."

When I run the command with the verbose flag I get the following:
"Usage: qiime tools import [OPTIONS]
Import data to create a new QIIME 2 Artifact. See https://docs.qiime2.org/
for usage examples and details on the file types and associated semantic
types that can be imported.

Options:
--type TEXT The semantic type of the artifact that will be
created upon importing. Use --show-importable-types
to see what importable semantic types are available
in the current deployment. [required]
--input-path PATH Path to file or directory that should be imported.
[required]
--output-path ARTIFACT Path where output artifact should be written.
[required]
--input-format TEXT The format of the data to be imported. If not
provided, data must be in the format expected by the
semantic type provided via --type.
--show-importable-types Show the semantic types that can be supplied to
--type to import data into an artifact.
--show-importable-formats
Show formats that can be supplied to --input-format
to import data into an artifact.
--help Show this message and exit.

                There was a problem with the command:

(1/1?) No such option: --verbose"

Many thanks in advance!

Nicholas_Bokulich · January 5, 2024, 6:39am

Hi @crw ,

The issue is that you are importing the wrong type of file. FASTA format data can be imported as FeatureData[Sequence] but not as FeatureData[Taxonomy]. That is basically what this error is saying:

Your taxonomy should be in a TSV/tab-separated text file with two columns (separated by tabs): the first column is unique IDs and the second column is the taxonomic lineages (with ranks separated by semicolons) for each ID.

You can see examples of the file types and commands for importing here:
https://docs.qiime2.org/2023.9/tutorials/feature-classifier/

My guess is that you have a FASTA with taxonomy IDs in the header line. So to convert to the expected file type you should:

remove the sequence lines
remove the > character from the start of each line
make sure that there is a tab between the ID and taxonomy information (not a space)

A command like this should do the trick (but no guarantees!):

grep '>' data/db/custom.16S.db/sears.refseq16S-v2.fa | tr -d '^>' | sed 's/ /'$'\t''/' > data/db/custom.16S.db/sears.refseq16S-v2-taxonomy.tsv

Good luck!

crw · January 5, 2024, 4:57pm

Hi Nicholas_Bokulich,

Thanks so much for your quick and detailed reply! This makes sense, but currently our database does not have unique IDs, only taxonomy and sequences. I've included a screen shot below. Is there a way to only use the taxonomy and sequences, or are unique IDs required?
Many thanks for your continued help!

Nicholas_Bokulich · January 5, 2024, 6:40pm

Hi @crw ,
Unique IDs are required. If you are pulling these sequences from NCBI refseqs (as I assume from the filename) then you could just keep the original accession ID. Otherwise, you could just make up a new random alphanumeric code for each.

Good luck!

crw · January 5, 2024, 7:11pm

Hi Nicholas_Bokulich,

Thanks so much for the additional clarification! It's much appreciated!

system · February 6, 2024, 1:12am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.