qiime2 OTU picking


I am experiencing some troubles running qiime2 for open reference OTU picking in sense that it works well in my hand with a small number of samples and subsampled sequences (500 sequences per 8 samples) but now I am running 8 samples with 100,000 sequences and it appears to me that it lasts more. In addition to that I have tried to run a dataset of 200 samples (100,000 sequences each) but it seems to be blocked. Compare to the same step using qiime1 it seems to me slower. Is my experience in line with the expected one?

the command is

qiime vsearch cluster-features-open-reference
–i-table table.qza
–i-sequences rep-seqs.qza
–i-reference-sequences 85_otus.qza
–p-perc-identity 0.85
–o-clustered-table table-or-85.qza
–o-clustered-sequences rep-seqs-or-85.qza
–o-new-reference-sequences new-ref-seqs-or-85.qza

in a linux cluster

Thanks a lot


1 Like

Hi @MichelaRiba,
You could speed up this step substantially by using the --p-threads parameter to enable multithreading. Since you are running on a linux cluster I would recommend this.

If you have not received an error message, then it is still running and you need only wait.

Did you process this same dataset with both q1 and q2? If not, then it is difficult to compare (since other factors can impact runtime). QIIME 1 and QIIME 2 use different software for OTU clustering (usearch vs. vsearch), so it is possible that there would be a difference in runtime.

1 Like

Thanks a lot for quick feedback and for highlighting the option.

Regarding your question ,yes I am using the same dataset I have already processed with qiime1 because otherwise I could not make any comparison.

I will definitely feedback on results on parallelisation with multithread option


1 Like


I am pretty satisfied with results of parallelisation using the parameter
–p-threads in OTU picking.
I used --p-threads 4 in a node with 12 CPUs in a linux cluster.
I had 10 samples sequenced at 150,000 reads roughly.

However I have still troubles when I try the OTU picking with the entire dataset of roughly 200 samples. I I used --p-threads 8 in a node with 12 CPUs in the linux cluster. The job seems being running however it is running for 4 days and not finishing. Is that expected?

Should I imagine to subset the overall group of samples from the fase of OTU picking. would it be correct to merge the final tables?
Or do you have some suggestion on how to optimise this step for a huge number of samples (we are going to have 400 samples in a project)
thanks a lot


It depends on the number of reads and complexity of the data (e.g., number of OTUs). Four days does sound like a long time — maybe you should check on the node to make sure you have not run out of memory (presumably that would kill the job and give an error but just to make sure). I do recall OTU clustering on very large datasets taking a long time (sometimes days) with QIIME 1… Since you are running this on a cluster it is also possible that the job was held up due to lack of resources if you use a queueing system.

No, you would not be able to merge — so I discourage that.

Denoising methods may be faster, or denoising prior to OTU clustering may speed up the OTU clustering step. For a quick denoising + clustering step I recommend trying deblur followed by closed-reference OTU clustering (or open-reference if closed-reference is a problem for you).

My best advice though is just to get more resources — talk to the cluster admins to see what you can do for larger-scale parallelization.



I’m giving some more details about the trouble I have:

before OTU clustering I run the “dereplication command”.

Afterwards I run the OTU open reference picking with vsearch.
I run in a node directly choosing 12 CPUs , choosing --p threads 4
entering the node I find the following details (top command)
I do not think this is a problem of memory, however I’m still pretty discouraged by the time going on and the process not finishing.

I tried with only 12 out of 200 samples and that worked.

I have the impression that the OTU picking phase is slower than in qiime1

I would like to try to finish that before starting trials with other methodologies as suggested (demonising)

Thanks a lot also for your patience!!


It sort of looks like your job is only using 1 core, rather than the 12 you are allocating. What cluster job scheduler are you using? And how are you allocating CPUs?

If you are submitting to a slurm queue, try --ntasks-per-node=1 --cpus-per-task 12

1 Like

Hi, thanks a lot.

I am submitting using qsub


Maybe qsub -l nodes=1:ppn=12