Parallel config on HPC

Hello,
I have used q2 plugins (e.g. dada2) on our HPC cluster (using only one node and 32 cpu's per node) previously.
Based on the above q2-2024.10 announcement, can I assume that any q2 plugin will also run in HPC clusters with multiple nodes? In particular I would like to submit a q2 rescript dereplicate job on our HPC.
Are there tutorials available how to run q2 plugin on HPC clusters?

Best regards,

3 Likes

Hello @arwqiime,

These are the docs for configuring parallelization. These docs are distributed under the books section of QIIME 2 Library in the "using QIIME 2" book for your future reference.

If you need further help configuring beyond the docs please ask! I believe you are the first person outside of our group to ask about this.

3 Likes

Nope, not every function from every plugin will support this.

In fact, dereplication is a good example! The underlying program is vsearch and for dereplication it uses only one thread because the algorithm is IO bound, not CPU bound.

I'm also looking forward to faster Qiime2 plugins through this and the artifact cache.

2 Likes

Hi @colinbrislawn and @Oddant1
I have started a q2 rescript dereplicate a few days ago and I see all cpu's quite busy.

grafik

I assumed that the jobs has been distributed to many threads (24 cpu's). From top I could see that the active process is vsearch. But is seems to run on many threads. But it should not according to your reply.
Is there an easy explanation? :slight_smile:

I have seen the books in q2 Library, and I will deeper look into the docs.
I am currently working on a q2 classifier for BOLD's COI-5P sequences for metabarcoding. I was able to dereplicate about 15M unique sequences (March 2025 release) to about 7M dereplicated sequences at 100% identity (did complete within a short time). A naive-bayes classifier did not complete using the 7M seqs until now (it is still running using 1 cpu). Therefore, I would like to dereplicate at 99% identity, and this seems to run much longer. I will test dereplication with 100% again on our HPC and compare the run times to my standalone server.

Best,

Update:
I submitted the q2 rescript dereplicate job on our HPC using slurm: Used 4 nodes and 128 threads per node. The runtime duration (from provenance tab) could be reduced from 44 min (40 threads on my linux server) to 25 min (on HPC with 4 nodes and 128 threads per node).
I have the impression that this is due only to the higher number of threads. I will have to get mor used with parsl and all the datails given in the books and will discuss it with our HPC admin.
I appreciate your support and the offer to come back to you if I need more help with configuration.
What kind of information do you need in this case?

Best,

2 Likes

Some helpful information for me would be the config you used and the job script you submitted. In addition to that, parsl writes its own logs. By default these will be in a directory called runinfo that will have been created in the directory you submitted your parsl job from. The contents of the runinfo directory should tell me the degree to which the resources you requested were actually utilized.

Discussing it with your HPC admin will definitely be helpful for determining what resources you have access to. It's a complicated process getting this stuff to function properly on very large data across many nodes on an HPC. We have hit a number of roadbumps getting it to work on our own system, but it has made it possible for us to run analyses we couldn't have feasibly done before.

1 Like

Hello @Oddant1
may I come back to you offer to help with HPC settings. Since a lot of data would be necessary to transfer, I have sent a direkt message to you. I would like to make sure that you received this message.
Best,

1 Like

Hi @Oddant1
I have received a message from another user (?) concerning support on HPC setup.

Could you briefly explain me how this would happen? What will be the costs for this support? I was asked to connect my wallet on a portal that I never have seen before...
I stopped it since I am not sure if this is comming from qiime2...

Best,

3 Likes

@arwqiime another forum user? Can you DM me their username and what they sent you? Definitely not part of the QIIME 2 team.

Just to be abundantly clear, we don't charge money for support on the forum. That's the whole point really.

Also, we won't ask for passwords or send you to crypto sites, and you can check the username for the shield glyph (image) which indicates a moderator or staff member.

In the event someone does do something fishy (even a moderator!) please make use of the Flag feature to send it to our moderation queue where others can have a look.
(If you aren't sure, just push the button anyways, we won't be upset if its a false positive.)

And also in the event money is relevant, it needs to go here in one of these categories:
Commercial Products and Services, q2-jobs, where some additional scrutiny is required.

5 Likes