qiime tools import on cluster

Thanks for the info, @nora.

Looks like your HPC is using a tool called “FUSE” for handling mounting your file system at /scratch/mastrori. I suspect that that might be the main cause for the issue - somehow the FUSE driver and Python (the language the QIIME 2 framework is implemented in) aren’t quite on the same page about what files are where. We have seen a few other reports of this floating around on the forum (cc @ebolyen). I don’t have a good workaround for you right now, besides asking your sysadmin if there is a non-FUSE device that you could work on.

Dead Marr,

thank you, I will try contact them again and see if we can find a workaround. Otherwise I will probably try docker (finger suuuuper crossed).

Best,
Nora

3 Likes

lol! This is going to be my new undercover alias...

Keep us posted!

:qiime2:

1 Like

Dear Matt (sorry for the previous typo),

unfortunately no luck on my side. I started again from scratch, installed miniconda3 and qiime2 v2020.8 and set my working folders to be out of any ‘FUSE’ (in our case, quobyte) space. Still unable to have the import function working, always with the same [Errno 17] File exists error. This is independent on whether or not I redirect my TMPDIR.

I was searching the forum a bit more, and it looks like a similar problem happened here, but has been closed without being solved.

I would gladly take in any other suggestion from your side. Right now I don’t have many other ideas but
trying again with a different - non native - installation of qiime.

However, I would really like to leverage the computational power available on my server.

Best,
Nora

P.s. from my reading around it looks like a problem which is not unique to the import process. Could potentially be related to how multithreading is handled during these commands? Unfortunately there is no way to manually deactivate multithreading on import, so I cannot check my hypotesis, but I hope it could be of any use.

Hello!

My guess its because the default tmpdir is also a FUSE filesystem.

Yeah, unfortunately we weren't able to reproduce the error on our end, which makes debugging pretty difficult.

Great suggestion! Unfortunately there is no multithreading enabled during import, for any of the available QIIME 2 imports. The issue has to do with reading and writing to the filesystem, not the CPU. Basically, the FUSE driver appears to report back inconsistent information to Python when operating - Python asks if a file has been written to the disk, FUSE tells it NO (when in reality it should've said YES), so then when Python attempts to write the file its like "whoa what, you just told me nobody was home.... guess i'll just leave..." (this is just speculation on my part).

I'm not sure what other options come to mind at the moment, but I'll let you know if I think of anything. Sorry!

Dear Matt,

according to what my sysadmin has been telling me, no, our scratch is not a FUSE filesystem. That is the actual reason for my confusion. Therefore there might be other filesystems not working properly under such circumstances.

Thanks for the explanation, it makes much more sense now. Luckily enough, for the specific experiment I’m working on right now I managed to compute everything locally, so no problem. However, it will be really important for me if this problem will be solved in the future, because often times the datasets I will have to handle will benefit from cluster performances.

Best,
Nora

Hi @nora!

The logs you shared above indicate otherwise - FUSE is a type of filesystem, and it looks like your sysadmin is using a specific vendor's version of the software (quobyte).

Would you be available to run some debugging scripts for us on your cluster? I think I mentioned somewhere above in this thread that we haven't been able to reproduce this (I personally think there is something buggy about the quobyte software, we've seen it before), and we don't have access to quobyte's FUSE system. This would be a huge help for trying to diagnose and debug - just let me know.

:qiime2:

Dear Matt,

sorry to hear that. Unfortunately I had relied simply on the answers I got from my sysadmin.

If none of them require sudo rights, absolutely. I would be glad to try give a hand debugging it.

Best,
Nora

1 Like

Dear Matt,

before this thread get automatically closed, could you suggest which debugging scripts to run, as you mentioned earlier?

Best regards,
Nora

1 Like

Thanks for the bump @nora, I won’t let this thread close, no worries. I am teaching a QIIME 2 workshop this week (and was out of the office the two weeks before that), so won’t have time to write the script for you until next week. More soon.

Hey there @nora, just wanted to let you know that this hasn't slipped off my radar. I have good news, I was finally able to reproduce this bug - it appears to be tied to certain networked file systems - we have only seen reports of this on beegfs filesystems (which is what your scratch mount is). Anyway, now that I've been able to reproduce this, we have some potential workarounds we are playing with. My current plan is to include a fix for this in the upcoming 2021.4 release. In the meantime, the only solution I can offer you is to not use that beegfs partition, if possible. If you want to see our development discussion, here it is:

Thanks for being patient and lending a hand on this!

:qiime2:

Dear Matt,

thanks a lot for the update, I will follow the thread.

Best,
Nora

1 Like

I’m having this same problem, although with cutadapt demux (our cluster also uses the BeeGFS scratch file system). Has this been fixed? Or a work-around?

Hi @alison - regarding timing, please see my note above:

In the meantime, if your sysadmin has another non-beegfs filesystem you could work on, that would be the quickest/easiest workaround.

I don’t think we have any other available file system, but I’ll ask. Would it potentially be less of a problem if I broke up the original fastq files into smaller pieces and ran them separately through demultiplexing and trimming to reduce the use of tmp memory? (And then cat them all together before denoising)