q2-sidle memory usage running out at qiime sidle prepare-extracted-region

Circle · April 15, 2022, 7:47am

Hi, everyone!
As I was building the kmer with plugin sidle when running the following command:

qiime sidle prepare-extracted-region \
>  --i-sequences sidle-db-filt-j1.qza \
>  --p-region "P1" \
>  --p-fwd-primer TGGCGAACGGGTGAGTAA \
>  --p-rev-primer CCGTGTCTCAGTCCCARTG \
>  --p-trim-length 100 \
>  --o-collapsed-kmers sidle-db-P1-100nt-kmers.qza \
>  --o-kmer-map sidle-db-P1-100nt-map.qza

After 2h the process is Killed. My computer is 16Gb RAM available, and the size of datasidle-db-filt-j1.qza is about 7.6m, The status of memory usage really shocked me.
The following is the code related to the problem. Is there any parameter that can limit the usage of memory, such as reducing the number of threads, etc? I can accept a longer operation time. Any hint will be greatly appreciated.

> qiime tools import \
>  --type 'FeatureData[Sequence]' \
>  --input-path 99_otus.fasta \
>  --output-path 99_otus.qza

> qiime tools import \
>  --type 'FeatureData[Taxonomy]' \
>  --input-format HeaderlessTSVTaxonomyFormat \
>  --input-path 99_otu_taxonomy.txt \
>  --output-path ref-taxonomy.qza

> qiime rescript cull-seqs \
>  --p-num-degenerates 3 \
>  --i-sequences 99_otus.qza \
>  --o-clean-sequences 99_3_otus.qza

> qiime rescript dereplicate \
>  --i-sequences 99_3_otus.qza \
>  --i-taxa ref-taxonomy.qza \ 
>  --p-mode 'uniq' \
>  --o-dereplicated-sequences 99_3_otus-derep-uniq.qza \
>  --o-dereplicated-taxa 99_otu_taxonomy-derep-uniq.qza

> qiime feature-classifier extract-reads \
>  --i-sequences 99_3_otus-derep-uniq.qza \
>  --p-f-primer TGGCGAACGGGTGAGTAA \
>  --p-r-primer CCGTGTCTCAGTCCCARTG \
>  --o-reads sidle-db-filt-j1.qza

> qiime sidle prepare-extracted-region \
>  --i-sequences sidle-db-filt-j1.qza \
>  --p-region "P1" \
>  --p-fwd-primer TGGCGAACGGGTGAGTAA \
>  --p-rev-primer CCGTGTCTCAGTCCCARTG \
>  --p-trim-length 100 \
>  --o-collapsed-kmers sidle-db-P1-100nt-kmers.qza \
>  --o-kmer-map sidle-db-P1-100nt-map.qza

Thank you very much!

Circle · April 17, 2022, 12:30pm

qiime2-q2cli-err-qfng98ihlog.txt (155.5 KB)
This file is the log when the processing was killed, I think this might provide a better understanding,

jwdebelius · April 18, 2022, 1:32pm

Hi @Circle,

Which version of Sidle are you using?

Most of the memory processing is passed off to dask; by default, Sidle will start a client with the number of threads you specify.
You could decrease your chunk size, or try playing with the dask client application and passing in your client directly. I tend to use Jupyter lab for this, since the dask Jupyter interface is really nice and I can manage the client directly through a GUI and what the progress.

Best,
Justine

Circle · April 19, 2022, 11:46am

Hi,@ jwdebelius
Thank you so much for your reply. The version of sidle I used is 2020.08, and the qiime2 version is 2021.11. I ran the command on WSL (Ubuntu18.04) before, and the memory usage problem occurred as described above.
After I switched to a server and used the same version of q2 and side to run the same command, I found that the memory occupied was only about 1Gb, and I got the output smoothly, which is very interesting…I assumed it might be the mounting problem of WSL that caused me the memory usage problem.
Again, really appreciate your help! I tried Jupyter Lab, and found it practical and concise.

Best,
Circle

jwdebelius · April 19, 2022, 1:31pm

Hi @Circle,

I'm glad you found a solution! Im not sure about WSL; I haven't worked on it much. Im not sure if there's memory partitioning or something else going it.

Best,
Justine