You'll have to do this prior to training the classifier. There are two ways:
- When running
rescript get-ncbi-data
you can set the taxonomic ranks you'd like to extract via the--p-ranks
option. The default is to pullkingdom phylum class order family genus species
. In your case you can use:--p-ranks kingdom phylum class order family genus
- Perhaps more easily, you can simply make use of rescript edit-taxonomy. That is, your command to remove the species labels may look something like this:
qiime rescript edit-taxonomy \
--i-taxonomy NCBI-diatoms-rbcL-ref-tax.qza \
--p-use-regex \
--p-search-strings 's__.*' \
--p-replacement-strings '' \
--o-edited-taxonomy NCBI-diatoms-rbcL-ref-tax-genus-only.qza
Then use NCBI-diatoms-rbcL-ref-tax-genus-only.qza
for all your downstream curation and classifier training steps. I hope I got the regex correct, but you can play around with it.
-Cheers!