qiime2 OTU picking with hundreds of samples

Nicholas_Bokulich · January 18, 2020, 4:11pm

Hi @MichelaRiba,
We have not forgotten about you — please hang in there.

your issue is quite similar to one that you reported earlier with a smaller number of samples: qiime2 OTU picking

we have not been able to replicate this issue so far and have not had others report an issue like this — the sort of conclusion we came to on that previous topic was that you are not allocating cluster resources correctly.

How about we pick up there on this topic — please check out the resource use of the finished jobs for comparison. You should check with your system admin to see what qsub command will give you a report of total CPU, RAM, etc used by a finished job (I know there is such a command in slurm, which I use, but don't know the equivalent for qsub but expect it must exist).

Something abnormal is occurring here. 200 is not an enormous # of samples at all (though it's the number of sequences that will matter for OTU clustering, not the number of samples, I am just assuming this translates to a "normal" # of sequences per sample). OTU clustering can be a time-consuming process but not weeks — I have run many studies consisting of 200 samples or more on a dusty old laptop within a couple of hours (with QIIME 2). This is why I really suspect something is going wrong with resource allocation.

How many sequences are you attempting to cluster? How many sequences did you have prior to dereplication? What type of sequences are you attempting to cluster? (16S?) and how long are the sequences?