Hi there,
I'm experiencing the classic no space left on device
error when using evaluate-fit-classifier
. There is 2.8 TB of disk space on the hard drive where the command is running, so I am very surprised to see it running out of space.
I am running this on an Ubuntu 20.04 server running qiime2 version 2022.2. I am attempting to make a classifier on 3.4 million arthropod sequences from GenBank. The evaluate fit classifier
function 1.5 weeks to run, then runs out of disk space. I've also used these 3.4 million sequences to make another classifier with a primer that has fewer degenerate bases. That takes much less time.
Given the other issues with this error, I'm mostly curious what I could do to decrease the amount of memory that it uses. I'm interested in identifying arthropods, so I don't think I should reduce the taxonomic breadth of the sequences. I'm also already filtering out any whole genomes, but am keeping mt genomes. Should I cull sequences? Should I filter more stringently? I'm open to any advice.
I'm also wondering if I may not be running this command in the hard drive that I think I'm running it in. The files are all located on /scratch
(with 2.8 TB), and the command is run on /scratch
. However, miniconda and thus qiime are installed in /home
(500 GB free). Could qiime be making the temporary files in /home
instead of /scratch
?
Here is the reprex:
qiime rescript get-ncbi-data --p-query "(cytochrome c oxidase subunit 1[Title] OR cytochrome c oxidase subunit I[Title] OR cytochrome oxidase subunit 1[Title] OR cytochrome oxidase subunit I[Title] OR COX1[Title] OR CO1[Title] OR COI[Title OR CO1 OR COI) AND 1:20000[SLEN] AND txid6656[orgn]" --o-sequences coi-unfiltered-seqs.qza --o-taxonomy coi-taxonomy-unfiltered.qza --p-n-jobs 20
qiime feature-classifier extract-reads --i-sequences coi-unfiltered-seqs.qza --p-f-primer GCHCCHGAYATRGCHTTYCC --p-r-primer TCDGGRTGNCCRAARAAYCA --p-max-length 20000 --p-min-length 50 --p-n-jobs 20 --o-reads coi-filtered-seqs.qza
qiime rescript evaluate-fit-classifier --i-sequences coi-filtered-seqs.qza --o-classifier coi-filtered-classifier.qza --i-taxonomy coi-taxonomy-unfiltered.qza --o-evaluation coi-classifier-evaluation.qza --o-observed-taxonomy coi-filtered-predicted-taxonomy.qza --verbose
Here is the error:
Validation: 117.35s
/home/tangled/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/q2_feature_classifier/classifier.py:102: UserWarning: The TaxonomicClassifier artifact that results from th
is method was trained using scikit-learn version 0.24.1. It cannot be used with other versions of scikit-learn. (While the classifier may complete successfully, the results will b
e unreliable.)
warnings.warn(warning, UserWarning)
Traceback (most recent call last):
File "/home/tangled/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/q2_feature_classifier/_taxonomic_classifier.py", line 86, in _2
tar.add(fn, os.path.basename(fn))
File "/home/tangled/miniconda3/envs/qiime2-2022.2/lib/python3.8/tarfile.py", line 1971, in add
self.addfile(tarinfo, f)
File "/home/tangled/miniconda3/envs/qiime2-2022.2/lib/python3.8/tarfile.py", line 1999, in addfile
copyfileobj(fileobj, self.fileobj, tarinfo.size, bufsize=bufsize)
File "/home/tangled/miniconda3/envs/qiime2-2022.2/lib/python3.8/tarfile.py", line 250, in copyfileobj
dst.write(buf)
OSError: [Errno 28] No space left on device
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tangled/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/q2cli/commands.py", line 339, in __call__
results = action(**arguments)
File "<decorator-gen-458>", line 2, in evaluate_fit_classifier
File "/home/tangled/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 245, in bound_callable
outputs = self._callable_executor_(scope, callable_args,
File "/home/tangled/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 485, in _callable_executor_
outputs = self._callable(scope.ctx, **view_args)
File "/home/tangled/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/rescript/cross_validate.py", line 43, in evaluate_fit_classifier
classifier, = fit(reference_reads=sequences,
File "<decorator-gen-515>", line 2, in fit_classifier_naive_bayes
File "/home/tangled/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 245, in bound_callable
outputs = self._callable_executor_(scope, callable_args,
File "/home/tangled/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 418, in _callable_executor_
artifact = qiime2.sdk.Artifact._from_view(
File "/home/tangled/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/qiime2/sdk/result.py", line 305, in _from_view
result = transformation(view, validate_level)
File "/home/tangled/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/qiime2/core/transform.py", line 70, in transformation
new_view = transformer(view)
File "/home/tangled/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/q2_feature_classifier/_taxonomic_classifier.py", line 87, in _2
os.unlink(fn)
File "/home/tangled/miniconda3/envs/qiime2-2022.2/lib/python3.8/tarfile.py", line 2460, in __exit__
self.fileobj.close()
OSError: [Errno 28] No space left on device
Plugin error from rescript:
[Errno 28] No space left on device
See above for debug info.