q2 moshpit eggnog-annotate: suggestion for multithreading with --p-n-threads

Dear QIIME2 developers,

I am trying out q2-moshpit using qiime2-shotgun-2024.2 Linux deployment on a HPC.
My capacity is capped at16 cpu/128 GB memory per node.

Most functions work well and relatively fast, whenever multithreading is possible. Using qiime moshpit eggnog-diamond-search on contigs assembled from a cleaned 50 million reads metagenomic dataset of an activated sludge sample works at an acceptable pace.

Particularly, the annotation function qiime moshpit eggnog-annotate is fairly slow (running about 15 minutes per 500 queries). I haven't checked the number of queries to go, but I am at 11000 and counting. The memory usage of this task seems low, conveniently output to stdout for each block of 500 queries (~1.5% used, 98.5% available). I was wondering if you could enable a --p-n-threads parameter for this function for multithreading?

Just an idea that may be worth considering.

As I am now dealing with 12 of such metagenomic samples sequenced, and 35 slightly larger sets of accompanying paired (ribo-depleted) metatranscriptomic data sets, I am hoping for optimized runtime.

Thanks guys (and thanks for the continuous development work in general over the years)

Cheers,
Pieter

2 Likes

Hey @pietervanveelen, welcome to the forum!

Thanks for your question/suggestion! We introduced support for multiprocessing (through the --p-num-cpus parameter) in the eggnog-annotate action in the 2024.5 version of the Q2 metagenome distribution - maybe you could check it out there? Unless you were referring to something else?

Cheers,
Michal

3 Likes

Dear @misialq, thanks for getting back promptly on my request.

Your suggestion nailed it, that's exactly what I was looking for. I continued with installing 2024.5 and will have try with the multiprocessing. A single CPU situation took more than two days to mapping all ~190K EggNog hits on the contigs of a single metagenome sample.

Cheers,
Pieter

2 Likes