I am a new QIIME2 user and have recently started working with my data on a ComputeCanada server (HPC), which makes use of a slurm job scheduler. To submit jobs to the cluster, users have to specify the amount of time, number of CPUs, and the amount of RAM (amongst other possible specifications) their job requires before cluster resources are allocated to the job. The server is running
QIIME2-2019.10 in a
I've reviewed the forum and have seen that 16GB of RAM is likely sufficient for QIIME2 jobs. That said, I am lost as to how much time I should request to allow my jobs to run. I have seen many users asking about DADA2 run times, which have been helpful (there seems to be wide amounts of variability in time needed depending on particular circumstances). Now, I am wondering now about jobs like OTU clustering or tree building (see below commands as examples).
qiime vsearch cluster-features-de-novo \ --i-table table.qza \ --i-sequences rep-seqs.qza \ --p-perc-identity 0.99 \ --p-threads 250 \ --o-clustered-table OTU-table-dn-99.qza \ --o-clustered-sequences OTU-rep-seqs-dn-99.qza
qiime alignment mafft
qiime alignment mask
qiime phylogeny fasttree
qiime phylogeny midpoint-root
There are just under 10 million sequences in my sequence file. The size of the QIIME artifact files are as follows:
340M rep-seqs.qza 690M seqs.qza 1.3M tabulated-metadata.qzv 122M table.qza
I am worried that I will arbitrarily select an amount of time (say 24 hours) and then have to rerun the job because I didn't give it enough time. The scheduler suggests from a little bit of additional time just it case it takes longer than expected, but I am lost as to what the expected time might be. Thank you for any help you may have!