Lengthy extract-reads

Aimee · April 15, 2018, 2:01am

Hi there

I am quite a newbie to Qiime2 and I seem to have run into a potential problem regarding the qiime feature-classifier extract-reads.

I am experiencing a very lengthy extract-reads, I have not experienced this problem using my other primers (18S) with other databases (PR2 and SILVA). I am running my pipeline through a high performance computer (so computing power is not a problem - currently using mem-per-cpu=10GB and cpus-per-task=16) and generally takes ~2 hours to run. The database I am using now (Midori) is double the size of the others, but I do not understand how 48 hours is not sufficient for it to run.

Primers:
f-primer GGWACWGGWTGAACWGTWTAYCCYCC
p-r-primer TANACYTCNGGRTGNCCRAARAAYCA

Appreciate the help!

Nicholas_Bokulich · April 16, 2018, 9:07pm

Hi @Aimee,
Thanks for posting — sorry to hear extract-reads isn't behaving nicely.

That sounds pretty typical for those databases.

A much larger database will of course require more time, but 48 hr does sound extreme... Are you able to confirm that the job is still running and has not run out of memory? There should be some sort of memory error if that is the case, but just want to make sure.

Since you mention that only these primers are causing this issue, perhaps the primers are at fault. Those primers contain a lot of degenerate bases, so I wonder if that is slowing things down.

Sorry I don't have an easier solution — unless if the job is throwing out an error it probably just means that this primer/database combo will just need more time and patience. Sorry!

Aimee · April 17, 2018, 1:42pm

The memory is fine, slurm output says job just ran out of time. I have been running other jobs after this and there is no problems regarding memory.

It is just very frustrating when I am tweaking the pipeline and have to wait for the end result to see how it influences the results.

Thanks anyway!