Hello,
My classify-sklearn run is being killed due to OOM (Out of Memory) despite having 192GB of RAM. The input rep-seqs file is small (8,331 sequences), but the system monitor shows 100% RAM and Swap usage before the process is terminated.
Command:
qiime feature-classifier classify-sklearn \
--i-classifier unite_ver10_97_s_all_04.04.2024-Q2-2024.5.qza \
--i-reads rep-seqs-dada2.qza \
--o-classification taxonomy-ITS.qza \
--p-n-jobs 12 \
--verbose
Rep-Seq Stats:
-
Count: 8,331
-
Mean Length: 278.7 bp
-
Range: 200–388 bp
System Info:
-
CPU: Intel Core Ultra 9 275HX (24 cores)
-
RAM: 192GB (100% utilized during crash)
-
Swap: 8.6GB (100% utilized during crash)
I then tried to run on a single core with a reduced batch size:
qiime feature-classifier classify-sklearn \
--i-classifier unite_ver10_97_s_all_04.04.2024-Q2-2024.5.qza \
--i-reads rep-seqs-dada2.qza \
--o-classification taxonomy-ITS.qza \
--p-n-jobs 1 \
--p-reads-per-batch 1000 \
--verbose
And it worked. The issue was resolved by reducing --p-n-jobs from 12 to 1. Are there specific scikit-learn or environment configurations (e.g., JOBLIB_TEMP_FOLDER) that can mitigate this memory multiplication in QIIME 2? Why did parallelization terminate the process? How does QIIME manage caching? I previously ran a similar analysis on an M2 Mac without issues, even though it only had 16 GB of RAM.
Thank you,
Ivan
