The datasets are becoming bigger, and the issue of data management becomes more pronounced. During import
operation, QIIME2 copies raw data into the archive in the /tmp
, which quickly runs out of space with big datasets. Additionally, it creates a redundant copy of the raw data on the hard drive.
I had this problem while developing q2-mOTUs, but the same issue was raised in q2-fondue.
It would be useful, if during qiime import
QIIME2 operated on a manifest file itself and only recorded the metadata of the dataset for provenance, somewhat similar to snakemake
.
EDIT: I used TMPDIR
change before, and I think it is inconvenient and should be improved in order to make Q2 more future-proof. Duplication of raw data doesn't give benefits, but requires time+space+compute.
@misialq maybe you have any ideas in that regard?
Cheers,
Valentyn