Qiime2 performance multi user

arwqiime · November 12, 2020, 1:56pm

Hello,
I have setup a linux machine with Qiime2-2020.8 via miniconda environment as @thermokarst recommended last year. (accessible installation on /opt/miniconda/...). It worked just fine during the bioinformatics course last year (with a 2019 edition of qiime and a silva-132 classifiers). This year I have roughly the same number of students in my course (10) but I a using the silva-138-99-nb-classifiers from your website with q2-2020.8 on the same linux machine (CentOS7). But now we ran into /tmp issues telling me that there is to less space avaialble for the users. The problematic step si: qiime feature-classifier classify-sklearn; each student used only one thread since I was aware that the new classifier requiresmore disk space.

I have checked it with coulleagues in out IT department, but they tell me that there is enough free disk space available.

Is there any setting in q2 that I should consider to change in order to get it running?
The number of reads is not too high, ca. 16,500 freatures (ASVs from dada2) with ca. 4 Mio reads in 20 samples. My system has 12 cores and 128 GB RAM.

Do you have any suggestion how to improve performance of the analysis?

Best regards!

Traceback (most recent call last):
File "/opt/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/q2cli/commands.py", line 329, in call
results = action(**arguments)
File "", line 2, in classify_sklearn
File "/opt/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/qiime2/sdk/action.py", line 234, in bound_callable
spec.view_type, recorder)
File "/opt/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/qiime2/sdk/result.py", line 289, in _view
result = transformation(self._archiver.data_dir)
File "/opt/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/qiime2/core/transform.py", line 70, in transformation
new_view = transformer(view)
File "/opt/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/q2_feature_classifier/_taxonomic_classifier.py", line 71, in _1
tar.extractall(dirname)
File "/opt/miniconda3/envs/qiime2-2020.8/lib/python3.6/tarfile.py", line 2010, in extractall
numeric_owner=numeric_owner)
File "/opt/miniconda3/envs/qiime2-2020.8/lib/python3.6/tarfile.py", line 2052, in extract
numeric_owner=numeric_owner)
File "/opt/miniconda3/envs/qiime2-2020.8/lib/python3.6/tarfile.py", line 2122, in _extract_member
self.makefile(tarinfo, targetpath)
File "/opt/miniconda3/envs/qiime2-2020.8/lib/python3.6/tarfile.py", line 2171, in makefile
copyfileobj(source, target, tarinfo.size, ReadError, bufsize)
File "/opt/miniconda3/envs/qiime2-2020.8/lib/python3.6/tarfile.py", line 252, in copyfileobj
dst.write(buf)
OSError: [Errno 28] No space left on device

andrewsanchez · November 12, 2020, 6:21pm

Hi, @arwqiime!

You can change the directory QIIME 2 uses to store temporary files by setting the TMPDIR environment variable. For example, before running the command you can set TMPDIR by with something like the following command: export TMPDIR="/dir/with/lots/of/data"

Will that work for you and your students? Of course, you can also try cleaning up TMPDIR to free up space.

That being said, OSError: [Errno 28] No space left on device can sometimes be more complicated than simple disk space.

arwqiime · November 13, 2020, 8:38am

Hi @andrewsanchez, thank you for your comments.
We have directed TMPDIR to another dir for the moment.

Could I get your comment on one detail of that step: We use amplicon products from 16S AND 18S sequences (primer 515F and 907R) to get both, bacteria and eukaryotes. Therefore, we used the q2 classifier for the entire SSU from the q2 documents website.
Would you expect a reduction of server load (temporary data and or RAM), if I would extract the reference sequences to the 515-907 region only? The classifier of the 515-806 region is only 30% in size of the full classifier.
Best regards

Nicholas_Bokulich · November 13, 2020, 8:51am

Yes, almost definitely

Note that 515-806 does not hit 18S (I think), so you might not see as much of a reduction in size, but something similar.

So do it! Faster to train, faster to classify.

Trimming to the same primers that you are using should also yield a slight increase in accuracy; you can test and compare classification performance with trimmed vs. untrimmed sequences using RESCRIPt:

system · December 14, 2020, 2:52pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.