Hi @ChristianEdwardson, I just re-read your post and realised that I was talking about training step (fit-classifier-naive-bayes
) and you are talking about the classification step (classify-sklearn
).
For the classification step, try leaving --p-n-jobs
at its default value of 1. There is a trade-off here between memory usage and speed, so if you're running out of memory you have to sacrifice speed to fit it in memory. The amount of memory you use should scale roughly linearly with the value of --p-n-jobs
.
The --p-chunk-size
parameter has different meanings for the classification and training steps. For the training step, the --p-chunk-size
parameter affects how the training data is split up and fed to the classifier to reduce memory consumption. For the classification step --p-chunk-size
affects how many reads the classifier sends to each parallel worker in each iteration.
Hope that helps, sorry for the slow realisation.