Dear QIIME2 community,
I would like to use the GTDB database for a 16S microbotia analysis. I downloaded the ssu_all_r202 file, imported it in qiime2 and extracted the V3V4 regions of the reads. I downloaded the ar122_taxonomy_r202 and the bac120_taxonomy_r202 files, merged them and then imported the output in qiime2. When I ran the qiime feature-classifier fit-classifier-naive-bayes command I got the error that "not enough values to unpack (expected 2, got 0)". This is obvious since the ID of the sequences aren't the same that the ID in the taxonomy file:
RS_GCF_001571485.1~NZ_LPTV01000250.1 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia flexneri [location=183..1721] [ssu_len=1538] [contig_len=5037]
"RS_GCF_014075335.1" "d__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Enterobacterales; f__Enterobacteriaceae; g__Escherichia; s__Escherichia flexneri"
I can suppress all the data after the "~" sign in the sequence ID but some ID become not unique.
As Nicholas_Bokulich said in an answer to another post that "GTDB works well with QIIME 2, the files are already in an appropriate format" I assume that I miss something but I don't really know what.
The tutorial "Training feature classifiers with q2-feature-classifier" was so helpful for me that I wish the same with the GTDB or even the SILVA database!
Some help would be appreciated!