You'll have to do this prior to training the classifier. There are two ways:
- When running
rescript get-ncbi-datayou can set the taxonomic ranks you'd like to extract via the--p-ranksoption. The default is to pullkingdom phylum class order family genus species. In your case you can use:--p-ranks kingdom phylum class order family genus - Perhaps more easily, you can simply make use of rescript edit-taxonomy. That is, your command to remove the species labels may look something like this:
qiime rescript edit-taxonomy \
--i-taxonomy NCBI-diatoms-rbcL-ref-tax.qza \
--p-use-regex \
--p-search-strings 's__.*' \
--p-replacement-strings '' \
--o-edited-taxonomy NCBI-diatoms-rbcL-ref-tax-genus-only.qza
Then use NCBI-diatoms-rbcL-ref-tax-genus-only.qza for all your downstream curation and classifier training steps. I hope I got the regex correct, but you can play around with it. ![]()
-Cheers!