Disk space calculations and usage for plugin output

mroper · November 30, 2023, 11:31am

I am developing a plugin that uses a lot of disk space and so I need to make quite precise allowances for the used disk space. I use a --o-* parameter to specify the target output qza file.

Internally, my plugin writes all output data to a designated directory.
-- I can control the disk space for that part.

Then from the function implementing the main action in _functions.py it returns a format derived from model.DirectoryFormat which stores the content of that designated directory. My understanding is that under the hood the framework is compressing the directory and creating the qza file (really a zip file).
-- It is at this point that I have questions about disk usage since it seems to be part of the main framework. Where in the file system is the creation of the compressed output done? How can I estimate how much disk space this part of the process will use? Can I control where in the file system the compression is done?

*If at all possible, I would like to be able to ensure that the compression etc etc happens in a directory that I can specify. Is that possible?

Thanks!
Michael

Any help would be much appreciated.

Oddant1 · November 30, 2023, 8:31pm

Hello @mroper,

If I am understanding your question correctly, the relevant framework code is here. As far as I can tell from what we are doing, and from the Python zipfile documentation, your zipped output is written directly to the location specified by --o-*. The zipping doesn't happen until the final output from an action is written.

mroper · December 2, 2023, 7:31am

And when I take a qza file as input, where does the decompression happen by default? Can I control where in the filesystem it will happen so that I dont run out of diskspace?

Oddant1 · December 4, 2023, 5:42pm

@mroper, yes you can. We extract the input files to your temp directory, more specifically we extract them to part of your tmp cache which is a folder QIIME2 makes at /$TMP/qiime2/<uname>/. You can change where the files are extracted to by changing the location of your temp directory.

mroper · December 22, 2023, 10:42pm

Thanks @Oddant1 ! Very helpful!! But in my shell $TMP is unassigned, so where does it find the location of temp?

lizgehret · December 29, 2023, 3:27pm

Hi @mroper,

Jumping in for @Oddant1 while he is out for the holidays

If you don't have $TMPDIR assigned to a particular location, it will have a default location on your machine (depending on your operating system).

For example, OS X generates a programmatic directory stored in /private/var and defines the $TMPDIR environment variable for locating the system temporary folder. The default location for Linux and Windows is most likely different, but I am less familiar with those operating systems in this context.

If you're using OSX or Linux, you should be able to run echo $TMPDIR in your terminal and you will see the default location that's been assigned.

Hope this helps! Cheers

mroper · December 30, 2023, 5:31am

Hi @lizgehret and @Oddant1, Im a bit confused now im afraid -- which is it please $TMPDIR or $TMP? Neither of these are set in my linux environment .... my understanding is that if $TMPDIR/$TMP (?which) is set then qiime uses that for decompressing input qza files and if they are not set then it will use whatever the OS default is. Is that correct? But I wonder if you are using mkdtemp() or something in the code?

It would be ideal if I could set the directory where QIIME was doing its internal computations in a place seperate to the system wide temporary directory. The reason is that the system temp directory is used for other things. Is that possible at all?

Thanks! (and Happy NY!)

colinbrislawn · January 2, 2024, 2:33am

Yes. You can simply set these two variables you mention above.

export TMPDIR=’/whatever/’

You may also find this thread useful: DADA2 set TMPDIR permanently

Happy New Year!