Silva Database Using Up All Storage

Hello everyone! I am a new user for Qiime2, and I ran into issues with my memory when I tried doing taxonomy annotation with the Silva classifier. I am running qiime2-2020.2 with VirtualBox with dynamically allocated VDI, 6000 MB of assigned base memory, and 3 assigned CPU. My computer has 213 GB of storage total and 4 CPU. Below is the code that I ran:

qiime feature-classifier classify-sklearn --i-classifier ‘/home/qiime2/Desktop/silva-132-99-515-806-nb-classifier.qza’ --i-reads dada2_rep_seq_16s.qza --o-classification taxonomy.qza

After I ran this code, my VDI's size kept on expanding to the point where it's now 48.3 GB and filled up all the memory on my computer, and VirtualBox was forced to pause itself because there is no more storage left (see screen shot). I tried deleting some old stuff from my host computer, but every time I freed up more storage and tried again, Qiime2 filled it just as fast. Does anyone have ideas on how to fix this? It is typical that doing annotation with the Silva classifier would use up that much storage?

After the size of the VDI exploded, I now only have a tiny amount of memory left:

Thank you and I really appreciate the help!

I tried again with a smaller dataset and now I got an error outputted that seems to be about memory too. It filled up my memories once again, but the virtual machine did not freeze.

Code

qiime feature-classifier classify-sklearn --i-classifier '/home/qiime2/Desktop/R1/silva-132-99-515-806-nb-classifier.qza' --i-reads test.16s.rep.qza --o-classification test.16s.taxonomy.qza

Error

Unable to allocate 6.19 GiB for an array with shape (830193664,) and data type float64

Debug info has been saved to /tmp/qiime2-q2cli-err-3m89vxud.log

Debug info

Traceback (most recent call last):
File "/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/q2cli/commands.py", line 328, in call
results = action(**arguments)
File "</home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/decorator.py:decorator-gen-343>", line 2, in classify_sklearn
File "/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/qiime2/sdk/action.py", line 234, in bound_callable
spec.view_type, recorder)
File "/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/qiime2/sdk/result.py", line 289, in _view
result = transformation(self._archiver.data_dir)
File "/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/qiime2/core/transform.py", line 70, in transformation
new_view = transformer(view)
File "/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/q2_feature_classifier/_taxonomic_classifier.py", line 72, in _1
pipeline = joblib.load(os.path.join(dirname, 'sklearn_pipeline.pkl'))
File "/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/joblib/numpy_pickle.py", line 605, in load
obj = _unpickle(fobj, filename, mmap_mode)
File "/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/joblib/numpy_pickle.py", line 529, in _unpickle
obj = unpickler.load()
File "/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/pickle.py", line 1050, in load
dispatchkey[0]
File "/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/joblib/numpy_pickle.py", line 355, in load_build
self.stack.append(array_wrapper.read(self))
File "/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/joblib/numpy_pickle.py", line 198, in read
array = self.read_array(unpickler)
File "/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/joblib/numpy_pickle.py", line 144, in read_array
array = unpickler.np.empty(count, dtype=self.dtype)
MemoryError: Unable to allocate 6.19 GiB for an array with shape (830193664,) and data type float64

Any thoughts on how this could be fixed?

Hi @Bill_Yen!

This is a common point of confusion, but memory and disk space refer to two separate aspects - memory refers to RAM, while disk space has to do with the size of your hard drive/other storage.

The error message you are sharing is referring to the disk space (file storage) - specifically your host machine has run out of disk space (which also means that your guest machine is out of disk space).

This is a configuration setting in virtualbox - you can set a disk to be fixed, or dynamically allocated - sounds like you went for the latter.

You might just need more disk space, unfortunately. One option is to rent an AWS instance.

Yes, as well as using up lots of memory, too (see my discussion above).

This is a memory error (separate from the storage errors above) - this is something that your computer has a fixed limit of, too, like storage space. You might be able to assign more RAM to your virtualbox machine, or, see my link above about AWS.

Good luck and keep us posted!

:qiime2:

Would getting a larger SDD for my computer help with this issue or is it only related to RAM?

Hi @Bill_Yen,

Looks like a RAM issue to me. Have you tried the prototype Silva 138 classifiers referenced below?

They do have a smaller memory footprint. If you want to save on even more drive space and memory requirements use the classifier without the species labels. Note, you may have to retrain the classifiers yourself. In which case, make use of the provided sequence and taxonomy qza files.

Keep an :eye: out for updates related to this topic.

-Best wishes!

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.