q2 diversity beta-group-significance performance

arwqiime · October 25, 2024, 12:50pm

Hello,
I am using q2-amplicon-2024.5 and experienced some performance differences with q2 diversity beta-group-significance on two independent Linux machines.
The runtime duration was very different, although all parameters were identical.
I added the --verbose option but the output was identical. Here are the duration data from the provenance tabs.

Almost 4 minutes on a 40-core AlmaLinux 9

Only 11 seconds on a 12-core AlmaLinux 8.3

Is there another possibility to inspect performance other than the --verbose option?

Oddant1 · October 25, 2024, 5:22pm

Hello @arwqiime,

There is no built-in way to view more detailed performance statistics than that in QIIME 2, I do have a few questions that may help us nail down what happened.

Are you able to reproduce these runtime discrepancies?
Are these machines being used by other people at the same time as you?
Is there any reason that on the slower machine the input artifact would need to be moved from one storage device to another before executing the action or that the same would need to happen for the output as the action is finishing?

arwqiime · October 28, 2024, 9:08am

@Oddant1

Yes, I could reproduce these discrepancies with two datasets (Bray-Curtis and Jaccard)

No, I am the only user

The analyses were done on the local file system (for the low performance) or on a mounted volumen (for the fast performance). In order to make this identical for both servers, I moved the data to a network mount and run the commands on CLI on each machine. The network share is physically not part of any of the two servers, both servers do have to load the data over the network. Same differences observed:

runtime:

start: 2024-10-28T08:07:56.475Z
end: 2024-10-28T08:14:24.773Z
duration: "6 minutes, 28 seconds, and 298657 microseconds"

runtime:

start: 2024-10-28T08:17:46.203Z
end: 2024-10-28T08:17:57.259Z
duration: "11 seconds, and 56196 microseconds"

I also compared two q2 releases: q2-amplicon-2024.2 and a fresh installation of q2-amplicon-2024.5 (no third-party libraries installed e.g. empress or gemelli): the 2024.2 installation did complete in ca. 7 sec, while the fresh 2024.5 installation again took over 6 minutes for the identical command. However, I am quite sure that this is not related to the q2 release as the 'fast' server also has q2-amplicon-2024.5 installed.
Since I did not have these issues in the past, I was searching for additional performance tools in order to track down the 'real' cause for the low performance. I was also thinking about other changes on that machine in the past few weeks, but could not remember that anything has changed

Best,

Short addition:
I have analyzed the identical runs on both servers using strace -c <q2 command> and looked at the results with a colleague from our IT department. He noticed a lot of futex calls that could indicate that there is some 'competition on some resources'.
Does this point to something important in this case?
Otherwise I will wait for the next release see whether this problem is gone. Should be out soon, right?

Oddant1 · October 29, 2024, 5:38pm

To clarify, you are saying on the "slow" server it completed quickly on q2-amplicon-2024.2 but slowly on q2-amplicon-2024.5, and on the "fast" server it is running quickly on q2-amplicon-2024.5?

I have no good explanation for this. The locking is particularly perplexing to me. The only locking we do ourselves in QIIME 2 is when you run a pipeline in parallel, it is possible those calls are coming from one of our dependencies though.

The new release should be coming soon, and hopefully that resolves this issue. If it does, then I suppose we can call this a fluke? If it doesn't, then I will work with you further on this.

arwqiime · October 30, 2024, 3:34pm

Yes, correct!
I have seen that 2024.10 is out, and I will test is soon.
Thank you for your comments!

arwqiime · November 8, 2024, 1:57pm

Hi @Oddant1
I had the chance to test the beta-group-significance command also on the most recent q2-amplicon-2024.10 release as well.

The test was run on an NFS mounted volume using the identical command.
I compared conda installations of q2-2024.2, q2-2024.5, and q2-2024.10, which were installed on two AlmaLinux servers.

Almalinux 8 (12 cpu)
q2-2024.2: 3 sec
q2-2024.5: 12 sec
q2-2024.10: 10 sec
This server has been used for q2 conda installations only.

AlmaLinux 9 (40 cpu)
q2-2024.2: 5 sec
q2-2024.5: 63 sec
q2-2024.10: 64 sec
I am using this server for several bioinformatics tasks, but I am unsure if, and how these tools can affect q2 installations. I keep command-line tools separate in different conda environments as good as I can, but I cannot rule out that the performance issue is due to using other bioinformatics tools on this server.

Nevertheless, I do see some performance drop from 2024.2 to 2024.5/2024.10 on both servers.

Thank you for that, but I will have to figure out the server-specific differences first (or to rebuild the second server).

Oddant1 · November 13, 2024, 5:03pm

@arwqiime Apologies for the delay, I was out for a few days around the weekend. I am thus far unable to reproduce the performance discrepancies you are seeing either on my laptop or on the compute cluster I have access to even though the compute cluster ought to be set up similar to your servers.

Can you please DM me the inputs you are using? I don't feel like this should be dependent on them, but it couldn't hurt to try.

Oddant1 · November 18, 2024, 5:45pm

@arwqiime. Thank you for sending me your inputs. I am still unable to replicate these performance differences either on my laptop or on the compute cluster. All the times I get are within 1 second of each other.

arwqiime · November 19, 2024, 8:43am

Hi @Oddant1,
Thank you for looking into this issue. I will have to look deeper inside conda, and all the oprion there. Channel_priority could be an issue. What I could confirm is that servers with only few conda environments (basically only different qiime2 releases and fastqc/multiqc) installed, show better performance compared to (more powerful) server with many environments (ca. 25 envs mainly from bioconda).

Thank you for your help!
Best,

system · December 20, 2024, 10:41pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.