feature-classifier fit-classifier-naive-bayes optimisation

Nicholas_Bokulich · August 3, 2020, 3:19pm

Welcome to the forum, @jasongallant,
Thanks for digging up this old topic... some of the info I wrote in there a year ago is impacted by recent updates.

That is correct — fit-classifier-naive-bayes does not give any "progress update" unfortunately.

I'd recommend reducing database size to reduce runtime, if possible:

use extract-reads to focus on the amplicon region you are using
remove any low-quality sequences
dereplicate the database (ideally after extracting amplicons) to reduce database size and redundancy. You can use RESCRIPt to dereplicate the sequences together with the taxonomy.

RESCRIPt also has a "get-ncbi-data" method that you can use to download data from genbank and automatically format it and import it as QIIME 2 artifacts. Since BOLD deposits their public data on genbank (all or most? not sure), it would be possible to use that to grab public BOLD data.

Good luck!