Estimating memory needed for vsearch dereplication?

Ldrhodes · August 31, 2018, 8:39pm

I'm processing 625 MiSeq amplicon sequences (~450 bp), & encountered a memory shortage during dereplication using Q2 vsearch. The Linux unit being used has 30 GB memory with 15 GB in virtual memory. The unit is dedicated to this task, so there aren't many other processes occurring.

I've seen the 2 posts from January & February about memory limits & the inability to exclude low frequency sequences. Is there any way to estimate how much virtual memory needed for dereplicating with vsearch? I don't want to break up the data set as ir represents multiple years of sampling within a project, but perhaps that is the only choice.

Thanks! Linda

colinbrislawn · August 31, 2018, 9:11pm

Hello Linda,

You have perfect timing; the newest version of vsearch uses less memory while dereplicating! github issue.

~~Truly perfect timing.~~

EDIT: Looks like you can't install vsearch on OSX using conda right now... Oh no!

One option that is easy to try is to provide more memory to your vm. Say 25 or 27 GB and see how it goes.

EDIT 2: If you are up for it, you could install the newest version of vsearch directly from GitHub. I just confirmed that it works!

thermokarst · August 31, 2018, 9:32pm

Hey @colinbrislawn --- due to recent toolchain changes on bioconda, there are no longer osx builds --- this prompted us to set a hard version pin on vsearch:

https://github.com/qiime2/q2-deblur/commit/2a790df66945cedb3f1631bd9a09d1956a762061

So, this advice likely won't work because conda will detect a version specification mismatch and will uninstall any plugins that use vsearch.

If we can get osx builds back in business, then we can remove the version pin. Thanks! :qiime2:

system · October 2, 2018, 3:32am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.