demux is killed in version 2019.7, but not in 2019.1

Hi, I’ve been trying to demux a dataset with 2019.7, but it seems to demand more memory than demux in 2019.1, and either takes a ridiculously long time, or the kernal kills the process.

I’ve tried on both a linux laptop and an AWS instance. Both machines have 8GB of RAM, and have no problem demuxing this dataset in 2019.1

This is the command I successfully used in 2019.1, and it takes 20-30 minutes:
qiime demux emp-paired
–i-seqs $input
–m-barcodes-file $mapfile
–m-barcodes-column BarcodeSequence
–o-per-sample-sequences $dataset"demux.qza"

In qiime2-2019.7 (or 2019.4) I run this command:
qiime demux emp-paired
–i-seqs $input
–m-barcodes-file $mapfile
–m-barcodes-column BarcodeSequence
–p-no-golay-error-correction
–o-per-sample-sequences $dataset"demux.qza"
–output-dir $dataset

On my linux machine (ubuntu 18.04) with 2019.7, it took nearly 24 hours, and on my AWS instance, the kernel kills the process:

[21232.001782] [30524] 1001 30524 1882271 1830185 3643 10 0 0 qiime
[21232.001790] Out of memory: Kill process 30524 (qiime) score 981 or sacrifice child
[21232.016844] Killed process 30524 (qiime) total-vm:7529084kB, anon-rss:7320740kB, file-rss:0kB

Can you see any reason the process in the 2019.7 deployment is so different from previous versions?
Thanks

1 Like

The 2019.4 release of qiime "brought Golay barcode correction to emp-single and emp-paired"
See: QIIME 2 2019.4 is now available!

This process takes longer, as it’s recovering barcodes that are not an exact match to expected.

Don’t want to do Golay barcode correction? --p-golay-error-correction False

Colin

Hi Colin,
Thanks for your reply. I’m already using --p-no-golay-error-correction. Would you still expect the 2019.4+ versions to have a longer runtime with that parameter?
Thanks

Oh, whoops! I should have read your command more carefully.

I’m not what other steps might make this take longer… Let’s see what the qiime devs recommend.

Colin

I think I know what wrong, but I don’t really have a solution.

When the golay correction was added, we created a correction stats table which appears to create a record for every read. This means a lot of memory is needed compared to before (really we should be appending to a file instead of a dataframe), and we are simply doing more work.

I have a strategy for implementing optional outputs, and this looks like a serious contender for it. When there’s no golay correction, I don’t think this output is very useful. But even in the immediate, I think we could probably make this more efficient in general.

Sorry for the performance regression, that’s definitely on us.

2 Likes

Hi Evan, Thanks for the explanation. In the meantime, I’ve just been using 2019.1 with no problems, and moving over to 2019.7 when updates call for it.

I’ll be on the lookout for changes to this in the future, or plan on allocating more resources to the step.
Anne

1 Like