qiime tools import on cluster

Dear Matt,

before this thread get automatically closed, could you suggest which debugging scripts to run, as you mentioned earlier?

Best regards,
Nora

1 Like

Thanks for the bump @nora, I won’t let this thread close, no worries. I am teaching a QIIME 2 workshop this week (and was out of the office the two weeks before that), so won’t have time to write the script for you until next week. More soon.

Hey there @nora, just wanted to let you know that this hasn't slipped off my radar. I have good news, I was finally able to reproduce this bug - it appears to be tied to certain networked file systems - we have only seen reports of this on beegfs filesystems (which is what your scratch mount is). Anyway, now that I've been able to reproduce this, we have some potential workarounds we are playing with. My current plan is to include a fix for this in the upcoming 2021.4 release. In the meantime, the only solution I can offer you is to not use that beegfs partition, if possible. If you want to see our development discussion, here it is:

Thanks for being patient and lending a hand on this!

:qiime2:

Dear Matt,

thanks a lot for the update, I will follow the thread.

Best,
Nora

1 Like

I’m having this same problem, although with cutadapt demux (our cluster also uses the BeeGFS scratch file system). Has this been fixed? Or a work-around?

Hi @alison - regarding timing, please see my note above:

In the meantime, if your sysadmin has another non-beegfs filesystem you could work on, that would be the quickest/easiest workaround.

I don’t think we have any other available file system, but I’ll ask. Would it potentially be less of a problem if I broke up the original fastq files into smaller pieces and ran them separately through demultiplexing and trimming to reduce the use of tmp memory? (And then cat them all together before denoising)

Hi @alison - the size of the data isn’t the issue, it is how the QIIME 2 Framework is interacting with the TMPDIR location that is the problem. In general everything is well behaved, except on certain networked filesystems, like beegfs.

If you can set your TMPDIR to a new location, that is the quickest workaround I have for you. If that new location has less disk space, then you might need to think about strategies for breaking things up into smaller chunks, but that is a secondary concern, and depends on the specifics of the replacement filesystem you’re working with.

Thanks! It seemed like most of the questions about this mentioned re-directing tmp, and I hadn’t hit the problem with a tiny data set that could run in the “default” tmp. So that’s where my logic was, that maybe if I could keep it small enough to use default tmp instead of re-directing it might work (and save me waiting for my sysadmin to answer an email).

1 Like

At least on my cluster (Saga), creating a dedicated job tmp space seems to solve this problem.

requesting a job-specific tmp space in the job header:
#SBATCH --gres=localscratch:

directing TMP to that space:
export TMPDIR=$LOCALSCRATCH

1 Like