We are trying to use a very specific COI database to classify our data: MZGdb Atlas Cnidaria (World Oceans)
What we have done so far is download the desired .fasta file from that website: MZGfasta-coi__T4000200__o00__A.fasta, made it upper case and then used:
qiime tools import \ --type 'FeatureData[Sequence]' \ --input-path MZGfasta-coi__T4000200__o00__A_upper.fasta \ --output-path MZGfasta-coi__T4000200__o00__A.qza
to create a qza and then we have downloaded the associated .mothur taxonomy file from the same site, made it tab separated, and then we went through the following steps:
qiime tools import \ --type 'FeatureData[Taxonomy]' \ --input-format HeaderlessTSVTaxonomyFormat \ --input-path MZGmothur-coi__T4000200__o00__A.tsv \ --output-path MZGmother-coi__t4000200__o00__A.qza
qiime feature-classifier fit-classifier-naive-bayes \ --i-reference-reads MZGfasta-coi__T4000200__o00__A.qza \ --i-reference-taxonomy MZGmother-coi__t4000200__o00__A.qza \ --o-classifier COI_jellyfish_classifier.qza
qiime tools validate COI_jellyfish_classifier.qza
qiime feature-classifier classify-sklearn \ --i-classifier COI_jellyfish_classifier.qza \ --i-reads rep-seqs-dada2.qza \ --o-classification jellyfish-taxonomy-rescript.qza
qiime metadata tabulate \ --m-input-file jellyfish-taxonomy-rescript.qza \ --o-visualization jellyfish-taxonomy-rescript.qzv
We are unsure of whether our data are really just very strange or if something has gone wrong. When looking at the bar chart from the visualisation, almost everything is identified as one species and none of the expected species show up at all with only 6 taxa identified in total.
We are running qiime2-2021.2 (which we realise is quite old, but we don't think this is the problem?) on linux Mint 20.1 Cinnamon (on a virtual machine using proxmox)
Are we doing something wrong?