hundreds of samples vsearch OTU

MichelaRiba · January 18, 2020, 9:17am

I am writing because I have troubles in scaling up OTU vsearc clustering, I have already posted in the user support forum, but I did not solve. I am just asking for people who run hundreds of samples in parallel, I do not have any error since the procedure does nor end

I am running quite 2 search from a conda installation, after dereplicating sequences, in the following way. The command will be launched using a variant of make

joined_import_filter_derep_OTU:
echo PBS -N star -l select=2:ncpus=12:mem=$(STAR_RAM)gb;
condactivate qiime2.1 ;
qiime vsearch cluster-features-open-reference
–i-table table.qza
–i-sequences rep-seqs.qza
–i-reference-sequences 85_otus.qza
–p-perc-identity 0.85
–p-threads 8
–o-clustered-table table-or-85.qza
–o-clustered-sequences rep-seqs-or-85.qza
–o-new-reference-sequences new-ref-seqs-or-85.qza
–verbose

I succeed in performing clustering for OTU using 30 samples and the sugested options for parallelisation. I am now waiting for a job run-in 200 samples but it seems needing a lot of time, while if I compare the same phase (OTU clustering in qiime1) that did not last so long (a couple of days with qiime1 and more than 2 weeks with qiime2). is there anybody who has experience with such numerosity of samples?
Thanks a lot!

Hoping this would be sufficient for you,
please tell me if you think you need more

the question however is very simple: parallelisation with hundreds of samples already run with qiime1 in less time

Did you have any similar feedback from users?
Thanks a lot

Nicholas_Bokulich · January 18, 2020, 3:47pm

A post was merged into an existing topic: qiime2 OTU picking with hundreds of samples