(error) qiime feature-classifier fit-classifier-naive-bayes

I wanted to see V3 and V4 regions as well as all other regions in the Classify taxonomy step, so I downloaded the latest version (138.1) of the data from the Download tab on the SILVA homepage and followed the steps below.
However, I got an error message in the 'qiime feature-classifier fit-classifier-naive-bayes' step.
Looking closely at the logs, it says 'Taxonomy format feature IDs must be unique', but I'm not sure where to start and how to fix it.
I'm not sure why I'm getting this error despite going through the deduplication process.
Please help me doctors.. :frowning:

SILVA homepage Download tab : (Archive)

Sequence file name: SILVA_138.1_SSUParc_tax_silva.fasta.gz
Taxonomy filename: taxmap_slv_ssu_parc_138.1.txt.gz

Steps Taken:
#1. Extracting a gz file
gunzip 'Sequence and Taxonomy file'

#2. change sequence file to .gza file

Change RNA sequence file to DNA sequence (U -> T)

sed 's/U/T/g' SILVA_138.1_SSUParc_tax_silva.fasta > SILVA_138.1_SSUParc_tax_silva_DNA.fasta

Remove duplicate sequence IDs

awk '/^>/{f=!d[$1];d[$1]=1}f' SILVA_138.1_SSURef_tax_silva_DNA.fasta > SILVA_138.1_SSURef_tax_silva_DNA_no_dups.fasta

Convert to Sequence format (.qza)

qiime tools import
--type 'FeatureData[Sequence]'
--input-path SILVA_138.1_SSURef_tax_silva_DNA_no_dups.fasta
--output-path silva-138.1-ref-seqs.qza

#3. Convert Taxonomy file to Taxonomy format (.qza)

Remove duplicates

sort taxmap_slv_ssu_parc_138.1.txt | uniq > taxmap_slv_ssu_parc_138.1_no_dups.txt

Next steps

awk 'BEGIN {FS="\t"; OFS="\t"} {print $1, $4}' taxmap_slv_ssu_parc_138.1_no_dups.txt > modified_taxonomy.txt
sed 's/; /;/g' modified_taxonomy.txt > no_spaces_taxonomy.txt
echo -e "Feature ID\tTaxon" | cat - no_spaces_taxonomy.txt > final_taxonomy.txt
qiime tools import
--type 'FeatureData[Taxonomy]'
--input-path final_taxonomy.txt
--output-path taxmap_slv_ssu_ref_nr_138.1.qza

#4. Classifier
qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads silva-138.1-ref-seqs.qza
--i-reference-taxonomy taxmap_slv_ssu_ref_nr_138.1.qza
--o-classifier classifier.qza

Error Message:
Traceback (most recent call last):
File "/home/jww4557/.conda/envs/qiime2-2023.5/lib/python3.8/site-packages/q2cli/commands.py", line 468, in call
results = action(**arguments)
File "", line 2, in fit_classifier_naive_bayes
File "/home/jww4557/.conda/envs/qiime2-2023.5/lib/python3.8/site-packages/qiime2/sdk/action.py", line 271, in bound_callable
File "/home/jww4557/.conda/envs/qiime2-2023.5/lib/python3.8/site-packages/qiime2/core/type/signature.py", line 390, in transform_and_add_callable_args_to_prov
File "/home/jww4557/.conda/envs/qiime2-2023.5/lib/python3.8/site-packages/qiime2/core/type/signature.py", line 423, in _transform_and_add_input_to_prov
transformed_input = _input._view(spec.view_type, recorder)
File "/home/jww4557/.conda/envs/qiime2-2023.5/lib/python3.8/site-packages/qiime2/sdk/result.py", line 401, in _view
result = transformation(self._archiver.data_dir)
File "/home/jww4557/.conda/envs/qiime2-2023.5/lib/python3.8/site-packages/qiime2/core/transform.py", line 70, in transformation
new_view = transformer(view)
File "/home/jww4557/.conda/envs/qiime2-2023.5/lib/python3.8/site-packages/qiime2/core/transform.py", line 214, in wrapped
return transformer(view.file.view(self._wrapped_view_type))
File "/home/jww4557/.conda/envs/qiime2-2023.5/lib/python3.8/site-packages/q2_types/feature_data/_transformer.py", line 194, in _23
df = _taxonomy_formats_to_dataframe(str(ff), has_header=True)
File "/home/jww4557/.conda/envs/qiime2-2023.5/lib/python3.8/site-packages/q2_types/feature_data/_transformer.py", line 85, in _taxonomy_formats_to_dataframe
raise ValueError(
ValueError: Taxonomy format feature IDs must be unique. The following IDs are duplicated: AY835431, AC121240, AC136838, AC144549, FW588215, FW588217, HG530135, AY916449, AY928077, AC150267, FW306011, AC158194, AC158205, BA000029, BK000554, HG738867, HG738868, AC184066, FP236383, KJ078649, KJ078650, KJ081864, KJ123753 .........

Hi @jww4557 ,
As you found, SILVA can be a bit challenging to work with (when starting with the raw source files).

This is why we made a plugin to automatically download and format data from SILVA (as well as other sources). Please see the tutorial here for using RESCRIPt:

You can use that to create a custom database from SILVA.

Good luck!

1 Like

An off-topic reply has been split into a new topic: error with pip installation (RESCRIPt)

Please keep replies on-topic in the future.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.