Taxonomic memory issue

Chozinentropy · June 6, 2020, 7:50pm

qiime feature-classifier classify-sklearn --i-classifier silva-132-99-515-806-nb-classifier.qza --i-reads rep-seqs.qza --o-classification taxonomy.qza
I ran this command and received "Plugin error from feature-classifier: The operation has run out of available memory." based on stats-dada2 there are ~400000 inputs. I'm not sure how to reduce the number of jobs properly that might help this run and generate proper data.

Thank you,
Cam

Mehrbod_Estaki · June 7, 2020, 7:32pm

Hi @Chozinentropy,

Yikes, do you mean 400,000 unique features or total reads? The classifiers deal with the former, and that is a massive # of features if true.

You can check out all the available parameters for any of the plugins with the --help tag
qiime feature-classifier classify-sklearn --help

The Silva database is quite large and the classifier can take a ton of memory, how much memory do you have available on your system? You can try reducing the --p-reads-per-batch and give it plenty of time. Increasing # of jobs you set here will proportionally increase the required RAM, so if you still run into an error you can set the --p-n-jobs to 1 (which I believe is the default anyways). As a final solution if you still don't can't allocate enough resources for this task you can always use the Greengenes database which is considerably smaller and should run without any issues.
Hope this helps, keep us posted!

Chozinentropy · June 7, 2020, 11:16pm

I have ~70,000 unique features. Preallocated 16gb of RAM to the VM out of 32gb should have plenty.
Now after running
qiime feature-classifier classify-sklearn --i-classifier silva-132-99-nb-classifier.qza --i-reads rep-seqs.qza --o-classification taxonomy.qza --p-n-jobs 1
and
qiime feature-classifier classify-sklearn --i-classifier silva-132-99-nb-classifier.qza --i-reads rep-seqs.qza --o-classification taxonomy.qza --p-reads-per-batch 1000
I still get
Invalid value for "--i-classifier": 'silva-132-99-nb-classifier.qza' is
not a QIIME 2 Artifact (.qza)
Invalid value for "--i-reads": 'rep-seqs.qza' is not a QIIME 2 Artifact
(.qza)

Chozinentropy · June 7, 2020, 11:17pm

this is running everything from the Desktop so I'm not sure what's causing the to not be usable now.

Mehrbod_Estaki · June 8, 2020, 12:16am

Hi @Chozinentropy,

I can tell you from experience that 16Gig is not enough (without adjusting the read-per-batch parameter) when working with the Silva database as this is essentialy writing the entire database into memory.

The error you are seeing MAY be related still to the insufficient memory issue, though this was fixed after qiime2-2019.7. What QIIME 2 version are you using?
A couple of others things to check: make sure the file-paths are correct, or that you are running those commands in the same directory as the classifier and rep-seqs file.
Lastly, can you try running your training the classifier with the Greengenes database which we know will not have these memory issues and if it works we can work out the issues with Silva.
But so far, everything to me still points towards insufficient memory isues.

Chozinentropy · June 8, 2020, 2:52pm

What would you recommend for --p-reads-per-batch X --p-n-jobs X? I've tried a number of different ways but cant get past this error. I'm using qiime2-2020.2 studio interface. Directory is correct.
Latest run was
qiime feature-classifier classify-sklearn\ --i-classifier silva-132-99-nb-classifier.qza\ --i-reads rep-seqs.qza\ --o-classification taxonomy.qza\ --p-reads-per-batch 1000\ --p-n-jobs -2
This is my rep-seqs.qza (42.9 KB)

Thank you for your help!
Cam

Mehrbod_Estaki · June 8, 2020, 2:58pm

Hi @Chozinentropy,
Did you try running it with the greengenes database first as I requested?
That will at least eliminate the possibility of other issues being at play and then we can focus on fine-tuning the memory parameters.

The reads-per-batch of 1000 should be fine, set the job numbers to 1 and run that. If you're still having issues then we can see about pinging someone with more experitse on the matter.

Chozinentropy · June 8, 2020, 4:29pm

no luck still getting the same error running
qiime feature-classifier classify-sklearn --i-classifier gg-13-8-99-nb-classifier.qza --i-reads rep-seqs.qza --o-classification taxonomy.qza --p-reads-per-batch 1000 --p-n-jobs 1

Thank you!
Cam

Chozinentropy · June 8, 2020, 11:08pm

I attempted prior steps with different parameters, but was met by the same error. I think My memory allocation is messed up after attempting a $TMPDIR fix. It is allocated to my 1tb dynamically allocated partition, but that doesn't seem to be it. I'm not sure of where to allocate the memory.

thermokarst · June 8, 2020, 11:29pm

Let's slow down and take a step back here.

2 days ago you reported this:

Okay, so we we initially started with an out-of-memory issue - not too uncommon.

Okay, so this is a completely different error message.

Before we sort out the memory error, let's address the "not a QIIME 2 Artifact" issues.

Can you please run the following, and copy-and-paste the complete error message:

qiime tools validate silva-132-99-nb-classifier.qza

qiime tools validate rep-seqs.qza

In general, you need to provide the entire error message, rather than just cherry-picking what you think might be relevant - more often then not there is a significant amount of context available in the complete error messages, which is why we ask for them.

Keep us posted.

PS - I am about to address the question you have asked about over here: TaxonomicClassifier. One thing I think might present as a confounding issue is that you have mutated your QIIME 2 environment, by installing a random version of scikit-learn in it. If you're using the same env in both of these posts, that might be problematic. I would suggest deleting the env and reinstalling (it should be pretty quick - conda will cache all the packages from the first env you created).

system · July 10, 2020, 5:29am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.