No space left on device Error on evaluate-fit-classifier with 2.8 TB free

Hi there,

I'm experiencing the classic no space left on device error when using evaluate-fit-classifier. There is 2.8 TB of disk space on the hard drive where the command is running, so I am very surprised to see it running out of space.

I am running this on an Ubuntu 20.04 server running qiime2 version 2022.2. I am attempting to make a classifier on 3.4 million arthropod sequences from GenBank. The evaluate fit classifier function 1.5 weeks to run, then runs out of disk space. I've also used these 3.4 million sequences to make another classifier with a primer that has fewer degenerate bases. That takes much less time.

Given the other issues with this error, I'm mostly curious what I could do to decrease the amount of memory that it uses. I'm interested in identifying arthropods, so I don't think I should reduce the taxonomic breadth of the sequences. I'm also already filtering out any whole genomes, but am keeping mt genomes. Should I cull sequences? Should I filter more stringently? I'm open to any advice.

I'm also wondering if I may not be running this command in the hard drive that I think I'm running it in. The files are all located on /scratch (with 2.8 TB), and the command is run on /scratch. However, miniconda and thus qiime are installed in /home (500 GB free). Could qiime be making the temporary files in /home instead of /scratch?

Here is the reprex:

qiime rescript get-ncbi-data --p-query "(cytochrome c oxidase subunit 1[Title] OR cytochrome c oxidase subunit I[Title] OR cytochrome oxidase subunit 1[Title] OR cytochrome oxidase subunit I[Title] OR COX1[Title] OR CO1[Title] OR COI[Title OR CO1 OR COI) AND 1:20000[SLEN] AND txid6656[orgn]" --o-sequences coi-unfiltered-seqs.qza --o-taxonomy coi-taxonomy-unfiltered.qza --p-n-jobs 20

qiime feature-classifier extract-reads --i-sequences coi-unfiltered-seqs.qza --p-f-primer GCHCCHGAYATRGCHTTYCC --p-r-primer TCDGGRTGNCCRAARAAYCA --p-max-length 20000 --p-min-length 50 --p-n-jobs 20 --o-reads coi-filtered-seqs.qza

qiime rescript evaluate-fit-classifier --i-sequences coi-filtered-seqs.qza --o-classifier coi-filtered-classifier.qza --i-taxonomy coi-taxonomy-unfiltered.qza --o-evaluation coi-classifier-evaluation.qza --o-observed-taxonomy coi-filtered-predicted-taxonomy.qza --verbose

Here is the error:

Validation: 117.35s
/home/tangled/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/q2_feature_classifier/classifier.py:102: UserWarning: The TaxonomicClassifier artifact that results from th
is method was trained using scikit-learn version 0.24.1. It cannot be used with other versions of scikit-learn. (While the classifier may complete successfully, the results will b
e unreliable.)
  warnings.warn(warning, UserWarning)
Traceback (most recent call last):
  File "/home/tangled/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/q2_feature_classifier/_taxonomic_classifier.py", line 86, in _2
    tar.add(fn, os.path.basename(fn))
  File "/home/tangled/miniconda3/envs/qiime2-2022.2/lib/python3.8/tarfile.py", line 1971, in add
    self.addfile(tarinfo, f)
  File "/home/tangled/miniconda3/envs/qiime2-2022.2/lib/python3.8/tarfile.py", line 1999, in addfile
    copyfileobj(fileobj, self.fileobj, tarinfo.size, bufsize=bufsize)
  File "/home/tangled/miniconda3/envs/qiime2-2022.2/lib/python3.8/tarfile.py", line 250, in copyfileobj
    dst.write(buf)
OSError: [Errno 28] No space left on device

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/tangled/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/q2cli/commands.py", line 339, in __call__
    results = action(**arguments)
  File "<decorator-gen-458>", line 2, in evaluate_fit_classifier
  File "/home/tangled/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 245, in bound_callable
    outputs = self._callable_executor_(scope, callable_args,
  File "/home/tangled/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 485, in _callable_executor_
    outputs = self._callable(scope.ctx, **view_args)
  File "/home/tangled/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/rescript/cross_validate.py", line 43, in evaluate_fit_classifier
    classifier, = fit(reference_reads=sequences,
  File "<decorator-gen-515>", line 2, in fit_classifier_naive_bayes
  File "/home/tangled/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 245, in bound_callable
    outputs = self._callable_executor_(scope, callable_args,
  File "/home/tangled/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 418, in _callable_executor_
    artifact = qiime2.sdk.Artifact._from_view(
  File "/home/tangled/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/qiime2/sdk/result.py", line 305, in _from_view
    result = transformation(view, validate_level)
  File "/home/tangled/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/qiime2/core/transform.py", line 70, in transformation
    new_view = transformer(view)
  File "/home/tangled/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/q2_feature_classifier/_taxonomic_classifier.py", line 87, in _2
    os.unlink(fn)
  File "/home/tangled/miniconda3/envs/qiime2-2022.2/lib/python3.8/tarfile.py", line 2460, in __exit__
    self.fileobj.close()
OSError: [Errno 28] No space left on device

Plugin error from rescript:

  [Errno 28] No space left on device

See above for debug info.
1 Like

Yep, this is what I would check first!
What is your TMPDIR environment variable set to?

3 Likes

Hmm. Did the settings of TMPDIR change? Based on previous posts I've tried:

conda activate qiime2-2022.2
echo $TMPDIR

env | grep *TMP*

But neither give any output. Also, /tmp is only 1.3 GB in size. I'm not sure where /tmp/ is actually mounted though, as it doesn't show up in df -h. Color me confused...

1 Like

This means those variables are not set.

(I'm not 100% sure where temp files go when $TMPDIR is missing, but after you set this variable it should be respected.)

Makes sense! :man_facepalming:t2:

I've sent TMPDIR to a location on /scratch. Hopefully that will fix things. Thanks!

1 Like

Hi @colinbrislawn. So far so good! It made it past the training stage after 458k seconds without running out of memory.

As the command is running, I see that my $TMPDIR is only 36GB in size. I assume the size ballooned during the training part of qiime rescript evaluate-fit-classifier, but that most of those files have now been deleted.

I was wondering if I'm out of the woods and safe to start allocating some of the hard drive space to other uses. The Training: 458594.53s message came up a few days ago, so I assume it's onto the next steps. Do those steps also take up a lot of hard drive space?

Thanks for your help.

1 Like

Training and testing use lots of memory and disk right now. :person_shrugging:

(top-hit classifiers are lighter. We have those too: )

Hopefully, you should be good to go.

I forget if I asked you this, but what database are you building and testing?

Thanks for the info.

(This is probably a topic for a different thread, but that also explains why some of the classifications that these machine learning-based classifier makes are very different than the BLAST results of the same sequence -- they're not the top hit at all!)

I'm making a COI database for the BF2/BR2 primer set from this paper. We're trying to identify the invertebrates present in a minnow species' diet. Originally I tried making the database on all animal COI sequences from GenBank, but because of the memory issues I ended up just using all arthropod sequences. Here's the code that I've been using:

qiime rescript get-ncbi-data --p-query "(cytochrome c oxidase subunit 1[Title] OR cytochrome c oxidase subunit I[Title] OR cytochrome oxidase subunit 1[Title] OR cytochrome oxidase subunit I[Title] OR COX1[Title] OR CO1[Title] OR COI[Title OR CO1 OR COI) AND 1:20000[SLEN] AND txid6656[orgn]" --o-sequences coi-unfiltered-seqs.qza --o-taxonomy coi-taxonomy-unfiltered.qza --p-n-jobs 20

qiime feature-classifier extract-reads --i-sequences coi-unfiltered-seqs.qza --p-f-primer GCHCCHGAYATRGCHTTYCC --p-r-primer TCDGGRTGNCCRAARAAYCA --p-max-length 20000 --p-min-length 50 --p-n-jobs 20 --o-reads coi-filtered-seqs.qza

qiime rescript evaluate-fit-classifier --i-sequences coi-filtered-seqs.qza --o-classifier coi-filtered-classifier.qza --i-taxonomy coi-taxonomy-unfiltered.qza --o-evaluation coi-classifier-evaluation.qza --o-observed-taxonomy coi-filtered-predicted-taxonomy.qza --verbose
1 Like