demux emp-paired killed after 2-3 days running

I am running qiime demux emp-paired on qiime2 version 2020.8.0. I'm using the following command:

qiime demux emp-paired   --i-seqs output/Slep-sequences.qza   --m-barcodes-file demux_barcodes_concatenated.tsv   --m-barcodes-column barcode-sequence   --p-no-golay-error-correction   --o-per-sample-sequences output/Slep-demux.qza   --o-error-correction-details output/demux-details.qza --verbose

After about 2-3 days of running, I get the error message "killed" with no other information. I see on this forum that they had to increase RAM and the number of processors to get it running. I am using linux and it seems that I should have enough memory to run this (refer to screen shot). Keep in mind my Slep-sequences.qza file is really large (121GB).
linux_storage

Why am I getting the "killed' message? When I look at the tmp folder (tmp/q2-SingleLanePerSamplePairedEndFastqDirFmt-kk4o1b5b) I see my sample names as if they are demultiplexed.

If I do need to increase RAM to run this command, what command would I use to do this? Sorry if this is more of a linux question, I'm newer to bioinformatics and don't want to accidentally run something that will mess up my university computer. Thank you for your time.

Hello!
Yes, indeed, it looks like a RAM issue.
If you can increase the amount of RAM allocated, you should do it. If you can increase it or not depends on the cluster you are using, please refer to thé manual or contact admins. I would not increase the number of threads since it will increase RAM requirements.
If your sequences were produced at different sequencing runs, I would split the dataset according to it and run separately with identical parameters. Splitting into artificial batches is also possible.

Best,

Hi @timanix,

Thank you for the quick reply and help. This data is all from one run and is large because of the deep sequencing depth. How much should I try to allocate? In the attached screenshot, I see that my max memory size is unlimited so what would I change to increase RAM?

Best,

E

Unfortunately, I am not familiar with a cluster you are using. I my case I always need to request certain amount of RAM for running tasks on HPC. In your case, as I understood (correct me if I am wrong), you don't need to indicate it and it will take as much as required?

If I am right I would try following options:

  • randomly split dataset to smaller batches and run each batch with identical Dada2 settings in order to merge output files after.
  • subsample reads before Dada2 if sequencing depth is too high.

Best,

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.