feature-classifier fit-classifier-naive-bayes optimisation

Welcome to the forum, @jasongallant,
Thanks for digging up this old topic... some of the info I wrote in there a year ago is impacted by recent updates.

That is correct — fit-classifier-naive-bayes does not give any "progress update" unfortunately.

I'd recommend reducing database size to reduce runtime, if possible:

  1. use extract-reads to focus on the amplicon region you are using
  2. remove any low-quality sequences
  3. dereplicate the database (ideally after extracting amplicons) to reduce database size and redundancy. You can use RESCRIPt to dereplicate the sequences together with the taxonomy.

RESCRIPt also has a "get-ncbi-data" method that you can use to download data from genbank and automatically format it and import it as QIIME 2 artifacts. Since BOLD deposits their public data on genbank (all or most? not sure), it would be possible to use that to grab public BOLD data.

Good luck!

2 Likes