How long should I expect my QIIME2 jobs to run for?

I am a new QIIME2 user and have recently started working with my data on a ComputeCanada server (HPC), which makes use of a slurm job scheduler. To submit jobs to the cluster, users have to specify the amount of time, number of CPUs, and the amount of RAM (amongst other possible specifications) their job requires before cluster resources are allocated to the job. The server is running QIIME2-2019.10 in a conda environment.

I've reviewed the forum and have seen that 16GB of RAM is likely sufficient for QIIME2 jobs. That said, I am lost as to how much time I should request to allow my jobs to run. I have seen many users asking about DADA2 run times, which have been helpful (there seems to be wide amounts of variability in time needed depending on particular circumstances). Now, I am wondering now about jobs like OTU clustering or tree building (see below commands as examples).

qiime vsearch cluster-features-de-novo \ --i-table table.qza \ --i-sequences rep-seqs.qza \ --p-perc-identity 0.99 \ --p-threads 250 \ --o-clustered-table OTU-table-dn-99.qza \ --o-clustered-sequences OTU-rep-seqs-dn-99.qza

`
qiime alignment mafft
--p-parttree
--p-n-threads 0
--i-sequences OTU-rep-seqs-dn-99.qza
--o-alignment aligned_sequences.qza

qiime alignment mask
--i-alignment aligned_sequences.qza
--o-masked-alignment masked_sequences.qza

qiime phylogeny fasttree
--i-alignment masked_sequences.qza
--o-tree unrooted_tree.qza

qiime phylogeny midpoint-root
--i-tree unrooted_tree.qza
--o-rooted-tree rooted_tree.qza
`

There are just under 10 million sequences in my sequence file. The size of the QIIME artifact files are as follows:
340M rep-seqs.qza 690M seqs.qza 1.3M tabulated-metadata.qzv 122M table.qza

I am worried that I will arbitrarily select an amount of time (say 24 hours) and then have to rerun the job because I didn't give it enough time. The scheduler suggests from a little bit of additional time just it case it takes longer than expected, but I am lost as to what the expected time might be. Thank you for any help you may have!

1 Like

Welcome to the forum @ahalhed!

This is a great question — unfortunately I can't give a great answer. This is because it can be difficult to predict runtime and memory requirements for many of these steps, since processes like OTU clustering depend on the complexity of the data.

Unless if you are paying for resources on the cluster, it would not hurt to ask for 2 days and more RAM... after all, this is tiny compared to what many cluster users require. But based on your description I think 1 day should be plenty of time.

It sounds like you have maybe already run dada2? So your data have already been dereplicated and quality controlled, reducing the complexity of the data.

OTU clustering on a small to average run size should only take an hour to a few hours... but we have also seen long OTU clustering runtimes if you have lots of unique sequences (sounds like you do not, since you have already run dada2).

mafft and tree-building should also only take a few hours to run, based on the size of your data.

Since you have already run dada2, OTU clustering is not necessary (just saying — but I expect you've read the tutorials and forum posts about this).

Also:

You should only use as many threads as you have requested for the job you are running. You should discuss with the cluster admin how to request more CPU, threads, and RAM.

I hope that helps!

1 Like

Thank you! This is a helpful place to start.

A collaborator on this project had previously done the initial quality control steps - they had sent me a fasta file with the sequences and an OTU table. I figured I would try the OTU clustering myself from the sequences, since I am new to QIIME2 and wanted to get some practice doing as much of the process as possible.

Ah now I understand. Okay, OTU clustering may take a few hours depending on how many parallel jobs you run. It sounds like you have a small-to-average size and complexity dataset. But again, you can request 48 hr and it is still a small amount of resources for the cluster so probably not an issue to overestimate.

1 Like

An off-topic reply has been split into a new topic: could you teach me how to install qiime2 in compute Canada

Please keep replies on-topic in the future.