demux is killed in version 2019.7, but not in 2019.1

Anne_Ballou · August 29, 2019, 7:55pm

Hi, I've been trying to demux a dataset with 2019.7, but it seems to demand more memory than demux in 2019.1, and either takes a ridiculously long time, or the kernal kills the process.

I've tried on both a linux laptop and an AWS instance. Both machines have 8GB of RAM, and have no problem demuxing this dataset in 2019.1

This is the command I successfully used in 2019.1, and it takes 20-30 minutes:
qiime demux emp-paired
--i-seqs $input
--m-barcodes-file $mapfile
--m-barcodes-column BarcodeSequence
--o-per-sample-sequences $dataset"demux.qza"

In qiime2-2019.7 (or 2019.4) I run this command:
qiime demux emp-paired
--i-seqs $input
--m-barcodes-file $mapfile
--m-barcodes-column BarcodeSequence
--p-no-golay-error-correction
--o-per-sample-sequences $dataset"demux.qza"
--output-dir $dataset

On my linux machine (ubuntu 18.04) with 2019.7, it took nearly 24 hours, and on my AWS instance, the kernel kills the process:

[21232.001782] [30524] 1001 30524 1882271 1830185 3643 10 0 0 qiime
[21232.001790] Out of memory: Kill process 30524 (qiime) score 981 or sacrifice child
[21232.016844] Killed process 30524 (qiime) total-vm:7529084kB, anon-rss:7320740kB, file-rss:0kB

Can you see any reason the process in the 2019.7 deployment is so different from previous versions?
Thanks

colinbrislawn · August 30, 2019, 3:24pm

The 2019.4 release of qiime "brought Golay barcode correction to emp-single and emp-paired"
See: QIIME 2 2019.4 is now available!

This process takes longer, as it's recovering barcodes that are not an exact match to expected.

Don't want to do Golay barcode correction? --p-golay-error-correction False

Colin

Anne_Ballou · September 3, 2019, 4:09pm

Hi Colin,
Thanks for your reply. I'm already using --p-no-golay-error-correction. Would you still expect the 2019.4+ versions to have a longer runtime with that parameter?
Thanks

colinbrislawn · September 3, 2019, 4:12pm

Oh, whoops! I should have read your command more carefully.

I'm not what other steps might make this take longer... Let's see what the qiime devs recommend.

Colin

ebolyen · September 6, 2019, 9:31pm

I think I know what wrong, but I don't really have a solution.

When the golay correction was added, we created a correction stats table which appears to create a record for every read. This means a lot of memory is needed compared to before (really we should be appending to a file instead of a dataframe), and we are simply doing more work.

I have a strategy for implementing optional outputs, and this looks like a serious contender for it. When there's no golay correction, I don't think this output is very useful. But even in the immediate, I think we could probably make this more efficient in general.

Sorry for the performance regression, that's definitely on us.

Anne_Ballou · September 9, 2019, 1:47am

Hi Evan, Thanks for the explanation. In the meantime, I've just been using 2019.1 with no problems, and moving over to 2019.7 when updates call for it.

I'll be on the lookout for changes to this in the future, or plan on allocating more resources to the step.
Anne

josegarcia · April 19, 2020, 3:14pm

Hi, sorry for bumping this old topic to the top of the list but I ran into this same problem (demux taking forever) and even the newest v. 2020.2 seems to be having an issue with it. So like Anne I am forced to keep working with 2019.1 only (I only have my laptop with 200 GB left and 16 GB RAM, so any new heavy guests -such as QIIME2 VM- are not welcome unless they are helpful). I know that developers do not get the gratitude they deserve but dumb users like me love what you do and would greatly appreciate if this critical step (demux) gets an improvement in performance soon. Thank you.

thermokarst · April 27, 2020, 2:37pm

Hi @josegarcia - disable barcode error correction to revert back to the 2019.1 behavior.

kmz · August 10, 2020, 11:44pm

How to disable it?

I'm having the same issues (process is killed after an hour or so or a log is generate which says memory error) with the 2020.6 version. I've allocated almost 10GB to it, and can't really assign more. There are the commands I'm running:

qiime demux emp-paired
--i-seqs emp-paired-end-sequences.qza
--m-barcodes-file sample-metadata.tsv
--m-barcodes-column barcode-sequence
--o-per-sample-sequences demux.qza
--o-error-correction-details demux-details.qza

thermokarst · August 11, 2020, 6:33pm

Hi @kmz - did you see this message above?

Give that a shot to disable error correction.

In the future, add --help after a command to see all of the available parameters.

:qiime2: