MemoryError when running feature classifer with pre-trained classifier

BenKaehler · May 11, 2017, 12:14am

Hi @ChristianEdwardson, I just re-read your post and realised that I was talking about training step (fit-classifier-naive-bayes) and you are talking about the classification step (classify-sklearn).

For the classification step, try leaving --p-n-jobs at its default value of 1. There is a trade-off here between memory usage and speed, so if you're running out of memory you have to sacrifice speed to fit it in memory. The amount of memory you use should scale roughly linearly with the value of --p-n-jobs.

The --p-chunk-size parameter has different meanings for the classification and training steps. For the training step, the --p-chunk-size parameter affects how the training data is split up and fed to the classifier to reduce memory consumption. For the classification step --p-chunk-size affects how many reads the classifier sends to each parallel worker in each iteration.

Hope that helps, sorry for the slow realisation.