RDP Reference Database in QIIME2 format

Setting reads-per-batch appears to work well for limiting memory consumption by classify-sklearn because reads are processed and saved out in chunks. I have not looked into it in detail yet but classify--chunk-size does not seem to offer as much of an advantage to fit-classifier-naive-bayes because the trained classifier is still stored in memory (I think).

So the bad news is I think you will just need a machine with > 8 GB to train your classifier. You may be able to filter the data somehow to reduce the memory demand (e.g., trim to your amplicon of interest before training, filter outliers or other noise, filter unassigned taxa). @SoilRotifer and I just released a new plugin, RESCRIPt, that might help with this.

1 Like