RDP database in qiime2

Hello- My problem is similar with this topic “RDP reference database in QIIME2 format”. I have managed to create a file which mimic the same format as silva. So I’ve download the current database unaligned and made a taxonomy file from it. It worked when I imported the sequences to qiime2 but when I want to import the taxonomy file it returns this error message:

An unexpected error has occurred:
sequence item 0: expected str instance, float found

Here is a tail of my taxonomy file:
>S000011228 Bacteria;“Actinobacteria”;Actinobacteria;Acidimicrobidae;Acidimicrobiales;“Acidimicrobineae”;Iamiaceae;unclassified_Iamiaceae;
>S004599941 Bacteria;“Actinobacteria”;Actinobacteria;Acidimicrobidae;Acidimicrobiales;“Acidimicrobineae”;Iamiaceae;Aquihabitans;
>S004582735 Bacteria;“Actinobacteria”;Actinobacteria;Acidimicrobidae;Acidimicrobiales;“Acidimicrobineae”;Iamiaceae;Aquihabitans;
>S004585082 Bacteria;“Actinobacteria”;Actinobacteria;Acidimicrobidae;Acidimicrobiales;“Acidimicrobineae”;Iamiaceae;Aquihabitans;
>S004582741 Bacteria;“Actinobacteria”;Actinobacteria;Acidimicrobidae;Acidimicrobiales;“Acidimicrobineae”;Iamiaceae;Aquihabitans;
>S000021841 Bacteria;“Actinobacteria”;Actinobacteria;Acidimicrobidae;Acidimicrobiales;“Acidimicrobineae”;Iamiaceae;unclassified_Iamiaceae;
>S004582823 Bacteria;“Actinobacteria”;Actinobacteria;Acidimicrobidae;Acidimicrobiales;“Acidimicrobineae”;Iamiaceae;Aquihabitans;

Do you have any idea ?

Hi @Lara_Farron,

It looks like you have your taxonomy file in a FASTA-style of format. Ultimately you’d want this file to be in a tab-delimited format, e.g.:

S000011228 <TAB> Bacteria;Actinobacteria;Acidimicrobidae;...

as outlined here. :slight_smile:

-Mike

1 Like

Hi @SoilRotifer;
Thanks for the answer :slight_smile: I tried to reformat the file as you said but it seems that the error persists…

File "/home/s***/miniconda3/envs/qiime2-2021.2/lib/python3.6/site-packages/q2_types/feature_data/_transformer.py", line 85, in _taxonomy_formats_to_dataframe
    ', '.join(df.index[df.index.duplicated()].unique()))
TypeError: sequence item 0: expected str instance, float found

An unexpected error has occurred:
  sequence item 0: expected str instance, float found

Any idea ?

What command are you using to import the taxonomy?

It should be something like this:

qiime tools import \
    --type ‘FeatureData[Taxonomy]’ \
    --source-format HeaderlessTSVTaxonomyFormat \
    --input-path taxonomy.txt \
    --output-path taxonomy.qza

Hi @SoilRotifer; first sorry for the delay. Okay so here is the situation; I’ve managed to import the taxonomy file and rdp file that I have deposited on Figshare. The problem is; when I want to do my classification with vsearch; I got a problem; it tells me that the identifiers S003546124 was reported in taxonomic search results, but was not present in the reference taxonomy. But when I grep “S003546124” in taxonomy file and rdp; it tells me that it is present ! So i’m a bit confused…

time qiime feature-classifier classify-consensus-vsearch \
>   --i-query /home/LaraFarron/Bureau/Anal_16s/asvs/rep-seqs.qza \
>   --i-reference-reads ref_seqs_16S_V3-V4.qza \
>   --i-reference-taxonomy otu_99_taxonomy.qza \
>   --p-perc-identity 0.97 \
>   --o-classification VSEARCH-TAXONOMY.qza \
>   --verbose
Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: vsearch --usearch_global /tmp/qiime2-archive-5fyexts9/7000bfbe-2ea3-4ccf-9533-191cd9584ec9/data/dna-sequences.fasta --id 0.97 --query_cov 0.8 --strand both --maxaccepts 10 --maxrejects 0 --db /tmp/qiime2-archive-b3ch8s8y/a45a2f1b-cf7f-41dc-982d-f9fa4137b5c2/data/dna-sequences.fasta --threads 1 --output_no_hits --blast6out /tmp/tmpij4c0dfo

vsearch v2.7.0_linux_x86_64, 5.7GB RAM, 8 cores
https://github.com/torognes/vsearch

Reading file /tmp/qiime2-archive-b3ch8s8y/a45a2f1b-cf7f-41dc-982d-f9fa4137b5c2/data/dna-sequences.fasta 100%  
825590003 nt in 2267901 seqs, min 50, max 369, avg 364
Masking 100%  
Counting k-mers 100%  
Creating k-mer index 100%  
Searching 100% 
Searching 100%
Matching query sequences: 1096 of 1460 (75.07%)
Traceback (most recent call last):
  File "/home/LaraFarron/miniconda3/envs/qiime2-2021.2/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2898, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'S003546124'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/LaraFarron/miniconda3/envs/qiime2-2021.2/lib/python3.6/site-packages/q2_feature_classifier/_consensus_assignment.py", line 105, in _import_blast_format_assignments
    t = ref_taxa[id_]
  File "/home/LaraFarron/miniconda3/envs/qiime2-2021.2/lib/python3.6/site-packages/pandas/core/series.py", line 882, in __getitem__
    return self._get_value(key)
  File "/home/LaraFarron/miniconda3/envs/qiime2-2021.2/lib/python3.6/site-packages/pandas/core/series.py", line 990, in _get_value
    loc = self.index.get_loc(label)
  File "/home/LaraFarron/miniconda3/envs/qiime2-2021.2/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2900, in get_loc
    raise KeyError(key) from err
KeyError: 'S003546124'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/LaraFarron/miniconda3/envs/qiime2-2021.2/lib/python3.6/site-packages/q2cli/commands.py", line 329, in __call__
    results = action(**arguments)
  File "<decorator-gen-367>", line 2, in classify_consensus_vsearch
  File "/home/LaraFarron/miniconda3/envs/qiime2-2021.2/lib/python3.6/site-packages/qiime2/sdk/action.py", line 245, in bound_callable
    output_types, provenance)
  File "/home/LaraFarron/miniconda3/envs/qiime2-2021.2/lib/python3.6/site-packages/qiime2/sdk/action.py", line 390, in _callable_executor_
    output_views = self._callable(**view_args)
  File "/home/LaraFarron/miniconda3/envs/qiime2-2021.2/lib/python3.6/site-packages/q2_feature_classifier/_vsearch.py", line 64, in classify_consensus_vsearch
    unassignable_label=unassignable_label)
  File "/home/LaraFarron/miniconda3/envs/qiime2-2021.2/lib/python3.6/site-packages/q2_feature_classifier/_consensus_assignment.py", line 30, in _consensus_assignments
    output.name, ref_taxa, unassignable_label=unassignable_label)
  File "/home/LaraFarron/miniconda3/envs/qiime2-2021.2/lib/python3.6/site-packages/q2_feature_classifier/_consensus_assignment.py", line 110, in _import_blast_format_assignments
    'taxonomy.').format(str(id_)))
KeyError: 'Identifier S003546124 was reported in taxonomic search results, but was not present in the reference taxonomy.'

Plugin error from feature-classifier:

  'Identifier S003546124 was reported in taxonomic search results, but was not present in the reference taxonomy.'

See above for debug info.

It appears that although the files in Figshare are present, they are private and I am unable to download the files in order to view them directly. Can these be made available?

Anyway, I am guessing that there is a formatting issue. Until I can access the files, I’d suggest exporting the SILVA taxonomy and sequence files we provide on the Data resources page and see if your formatting is similar.

Okay It’s public now ! That’s odd; it seems that the format is somehow similar … :confused:

Sorry for my late reply. It looks like I can download the FASTA file, but the taxonomy file appears to be no longer present. I’d also recommend that you upload compressed (e.g. zip / gzip) files to FigShare so that they download faster. :wink:

EDIT: the download appears to fail after several attempts. :frowning: