qiime feature-classifier fit-classifier-naive-bayes \ ... KILLED

Hi all,

I am a first-time user and have been slowly getting through some of my own pilot data. I previously ran through the beginning of the “moving pictures” tutorial but stopped before I reached the following problem I am currently having (ha!)

I am up to classification and have been attempting to train the classifier.
I have tried to run the steps with both the greengenes and silva databases. I am using the V3-V4 primers and have set a min-max length as suggested by another tutorial. The following appears to execute fine (this is the read from the silva database but as mentioned it also ran with greengenes).

qiime feature-classifier extract-reads
–i-sequences 99-otus-silva.qza
–p-f-primer CCTAYGGGRBGCASCAG
–p-r-primer GGACTACNNGGGTATCTAAT
–p-min-length 300
–p-max-length 600
–o-reads ref-seqs-silva.qza

However when I run the next stage after about ~1 minute or less it stops and says Killed.

(qiime2-2020.6)

qiime feature-classifier fit-classifier-naive-bayes \
–i-reference-reads ref-seqs-silva.qza \
–i-reference-taxonomy ref-taxonomy-silva.qza \
–o-classifier silva132-99otus-515-806-classifier.qza
–verbose

/opt/conda/envs/qiime2-2020.6/lib/python3.6/site-packages/q2_feature_classifier/classifier.py:102: UserWarning: The TaxonomicClassifier artifact that results from this method was trained using scikit-learn version 0.23.1. It cannot be used with other versions of scikit-learn. (While the classifier may complete successfully, the results will be unreliable.) warnings.warn(warning, UserWarning)
Killed

Although it doesn’t say it is the problem, I tried to make more space on my hard drive and I increased the CPUs to 4and memory to 4.5GB. I am using Docker and Microsoft Powershell. Another problem, according to the verbose reading, doesn’t quite make sense to me -> the current scikit-learn version is 0.23.1… and I am training it fresh so why wouldn’t it match? Of course the killed part looks like a separate issue…

I have only found one other person with this problem online and they just kept running the command until it worked! -> but if I missed someone else who fixed them problem please let me know and I can delete this. I am at the point where I have 100’s of tabs open and have lost sense of where I started.

Hoping this is an easy problem I have not yet experienced. Can provide more info if needed…
Thank you

EDIT: Looks like it is the memory… I made ~10 gb available (checked in task manager) and changed it in docker. When running the above command I can see the memory in docker increasing rapidly to the set max. When it hit 9GB memory used, that is when I get the killed output…

I can get a similar warning when running QIIME 2020.6, but it doesn't crash the program. My guess is that you're running out of memory - I work with an entirely different marker gene, but generally require about 70-100 GB RAM to complete the fit-classifier jobs. This can vary depending on the number and length of reference sequences, of course. There must be folks on the forum that have a good estimate for how much memory you'll need to do this (:postal_horn: :trumpet: calling on @SoilRotifer @Nicholas_Bokulich...)

You might also not need to bother with this - have you checked out the RESCRIPt tutorial on how to get SILVA data? Those folks have really simplified the process, and it might save you some headaches in trying to format the dataset yourself.

In summary:

  1. If you really want to build your own SILVA classifier, start by throwing way more memory at the process if you have it. Like 50 GB RAM, and see if the job still fails after 1 minute, and whether you get an "out of memory" error. If you don't have that much memory, you can always do it in a cloud compute environment like AWS (or Azure, Google, etc.).
  2. You might be able to avoid all of the previous point by gathering pre-formatted data from RESCRIPt.

Good luck!

2 Likes

Thank you Devon, your reply clears up and confirms a few things for me. For anyone else who comes across this problem/post, I will report back once I run through the suggestions.

edit: for anyone else finding this post -> it was indeed memory. I could watch it kill itself and I couldn’t give enough memory to complete. Ended up using just the full length 16s as a first run through without training the classifier.

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.