Hi,
I am writing because I have troubles in scaling up OTU vsearc clustering, I have already posted in the user support forum, but I did not solve. I am just asking for people who run hundreds of samples in parallel, I do not have any error since the procedure does nor end
I am running quite 2 search from a conda installation, after dereplicating sequences, in the following way. The command will be launched using a variant of make
joined_import_filter_derep_OTU:
echo PBS -N star -l select=2:ncpus=12:mem=$(STAR_RAM)gb;
condactivate qiime2.1 ;
qiime vsearch cluster-features-open-reference
--i-table table.qza
--i-sequences rep-seqs.qza
--i-reference-sequences 85_otus.qza
--p-perc-identity 0.85
--p-threads 8
--o-clustered-table table-or-85.qza
--o-clustered-sequences rep-seqs-or-85.qza
--o-new-reference-sequences new-ref-seqs-or-85.qza
--verbose
I succeed in performing clustering for OTU using 30 samples and the sugested options for parallelisation. I am now waiting for a job run-in 200 samples but it seems needing a lot of time, while if I compare the same phase (OTU clustering in qiime1) that did not last so long (a couple of days with qiime1 and more than 2 weeks with qiime2). is there anybody who has experience with such numerosity of samples?
Thanks a lot!
Hoping this would be sufficient for you,
please tell me if you think you need more
the question however is very simple: parallelisation with hundreds of samples already run with qiime1 in less time
Did you have any similar feedback from users?
Thanks a lot
Michela