qiime2 OTU picking with hundreds of samples

MichelaRiba · January 21, 2020, 5:50pm

Hi, thanks a lot again for the follow up!

I am sorry, I did not write well the pre-processing steps:
prior to vsearch clustering I did:
quality filtering on joined sequences and
vsearch dereplication,
if I extract data from that step I can find a fasta file of sequences, for example in my project lasting just 17 hours, with nearly 30 samples
that fasta file (input for vsearch clustering, if I am correct) has
7441021 lines.

Regarding the issue of memory I cannot see problems, or perhaps I did not check the right way, however:
I had this report:
Resources:
Limits: mem=128gb,ncpus=36,place=free
cpupercent=995,cput=17:04:44,mem=2528064kb,ncpus=36,vmem=18674140kb,walltime=15:33:29

I have exaggerated the requests, perhaps not the correct way, because in the end the process enters in only one node (with 18 cores, parallelized to 12 threads, maybe it is better to set cores=threads as my system manager suggests.

In addition I have found this additional point about parallelisation and tried to write for feedback,

there it seemed maybe with large datasets we ca wait long, is that right?

Michela

Hi, meanwhile I am re-doing the procedure with 126 samples using qiime1 uclust for out picking (pick_open_reference_otus.py) and I obtained the results in 4 hours,... consider that it took 15 or so hours for qiime 2 search to process 30 samples.
May I conclude that qiime1 is still a good idea for processing samples in less time?

Thanks a lot

Michela