Those IDs are not repeating. I noted my initial reply, here is the ID below. That is, the standard FASTA header format considers anything prior to the first space the full ID.
>RS_GCF_000213495.1~NZ_AFHD01000036.1
Yes, this is intended for the rep seqs, as outlined here.
Here is another approach to import everything for use as a classifier:
Download and extract full ssu file:
wget https://data.gtdb.ecogenomic.org/releases/release202/202.0/genomic_files_all/ssu_all_r202.tar.gz
tar -xvf ssu_all_r202.tar.gz
Extract and parse the FASTA header and write to file:
# Pull the header, keep the first two items (seqID and Taxonomy label), remove '>', and replace ' ' (space) with '\t' (tab)
egrep '^>' ssu_all_r202.fna | cut -d ' ' -f1,2 | sed 's/>//; s/ /\t/' > ssu_all_r202_tax.tsv
Then import as a taxonomy file:
qiime tools import \
--input-path ssu_all_r202_tax.tsv \
--type 'FeatureData[Taxonomy]' \
--input-format 'HeaderlessTSVTaxonomyFormat' \
--output-path ssu_all_r202_tax.qza
Then import the FASTA file as is:
qiime tools import \
--input-path ssu_all_r202.fna \
--type 'FeatureData[Sequence]' \
--output-path ssu_all_r202_seqs.qza
Perform QA/QC through RESCRIPt if needed. Then build classifier:
qiime feature-classifier fit-classifier-naive-bayes \
--i-reference-reads ssu_all_r202_seqs.qza \
--i-reference-taxonomy ssu_all_r202_tax.qza \
--o-classifier gtdb_classifier.qza
Test classifier:
qiime feature-classifier classify-sklearn \
--i-classifier gtdb_classifier.qza \
--i-reads rep-seqs.qza \
--p-n-jobs 4 \
--o-classification taxonomy.qza
qiime metadata tabulate \
--m-input-file taxonomy.qza \
--o-visualization taxonomy.qzv
I just tested this locally and it appears to work. Let us know if this works for you too.