Sidle memory issues: version update hasn’t worked

andresarroyo · April 19, 2023, 3:52pm

I am using q2-sidle with 6 V regions and I am experiencing memory problems with the qiime sidle reconstruct-database. They are similar to the ones that @Circle commented in both topics (file attached here).

I installed q2-sidle using the commands provided in the official docs:

conda install dask
conda install -c conda-forge -c bioconda -c qiime2 -c defaults xmltodict
pip install git+https://github.com/bokulich-lab/RESCRIPt.git
pip install git+https://github.com/jwdebelius/q2-sidle
qiime dev refresh-cache

When I check the q2-slide version, it is "2020.08". I have read that there is an updated version with improvements. Where and how can I download it?

Thanks

jwdebelius · April 19, 2023, 5:05pm

Hio @andresarroyo,

I think you need to install the "main" branch, rather than the 2020.8 branch as I mentioned above. I think if you install

pip install git+https://github.com/jwdebelius/q2-sidle@main

it might work better.

There's also a v0.1-beta that's the most recent release on Github which I think may have the memory update.

Could you also help me with the list of comamnds?

qiime sidle --help

Thanks,
Justine

andresarroyo · April 21, 2023, 11:47am

Hi @jwdebelius,

Thank you so much for your message. I have tried it by installing both version (main and v.0.1.0-beta) and the problem persists.

I was worried because after installing qiime-sidle v.0.1.0-beta, I checked the version (qiime sidle --version) and it appears "2020.8". However when I checked the qiime packages versions via conda list, the correct version was installed.

You can see the full list of commands I obtain after install v.0.1.0-beta.

My laptop has a memory of 16G and a hard drive of 518 GB. I am working with VSCode and I restarted the kernel after q2-sidle new installation. In addition I have set a high number of workers (--p-n-workers) and I decreased the batch size (--p-block-size) until 100. The error appears after 1:30m running the task. I attach you the log file.

I would like to finish the analysis with q2-sidle. What alternatives could I use?

Andrés

qiime2-q2cli-err-8xn5dpzm.txt (36.7 KB)

jwdebelius · April 21, 2023, 3:47pm

Hi @andresarroyo,

I split this into a new topic to help keep things staright since it looks like there might be a slightly different issue.

It looks like you have the correct version, so that’s good. Again, I’m sorry updates have been slow.

I developed this on a laptop with similar specs, and generally have done okay after the re-factor. I tended to run with no more than 4 threads (4GB each). It looks like, based on the error, you're allocating about half of that. I think in extreme cases, I've dropped to 2 threads (8GB each).

I found working in Jupyter lab with the dask jupyter widget helpful to trouble shoot memory issues. I'm not sure if there's a similar feature for VSCode, you could check the dask documentation. Sidle is set up to allow you to pass a dask server rather than creating one by default, so you can do your own custom configuration.

Some other things to consider might be trimming your database a little bit more stringently. For example, you might think about dropping from 5 degenerates (default in RESCRIPt) to 3, as recommended in the original SMURF paper.

I'm sorry it's so frustrating.

Best,
Justine

andresarroyo · April 25, 2023, 1:26pm

Hi again @jwdebelius,

I am working with Jupyter Notebebook inside VSCode (sorry for not detailed in the previous message). I have been using the sidle-reconstruction pipeline from qiime sidle v-0.1.0-beta and I would like to share some test that I have performed to avoid the previously mentioned problems.

1º Modify sidle-reconstruction parameters: Based on your experience, my first attempt was to reduce the number of --p-n-workers from >4 to 2 and to use a --p-block-size equal to 1000. The process was killed after 57 min. A similar approach was used with 3 and 4 workers and both processes were killed too. Below is the code used for the approach with 2 workers.

qiime sidle sidle-reconstruction \
    --i-kmer-map silva_128_V2_map.qza silva_128_V3_map.qza silva_128_V4_map.qza silva_128_V67_map.qza silva_128_V8_map.qza silva_128_V9_map.qza \
    --i-regional-alignment V2_aligment_map.qza V3_aligment_map.qza V4_aligment_map.qza V67_aligment_map.qza V8_aligment_map.qza V9_aligment_map.qza \
    --i-regional-table V2_f_table.qza V3_f_table.qza V4_f_table.qza V67_f_table.qza V8_f_table.qza dada2_pyro_V9_table.qza \
    --i-reference-taxonomy silva_128_ssu_nr99_tax_derep.qza \
    --p-region V2 V3 V4 V67 V8 V9 \
    --p-min-counts 0 \
    --p-database 'silva' \
    --p-block-size 1000 \
    --p-n-workers 2 \
    --o-database-map ./reconstructed_results/database_recons.qza \
    --o-database-summary ./reconstructed_results/database_recons_summ.qza \
    --o-reconstructed-table ./reconstructed_results/feature_table_recons.qza \
    --o-reconstructed-taxonomy ./reconstructed_results/taxonomy_recons.qza

2º Dask: I am totally new with Dask so I hope I have done the next steps properly. Next, there is an example of the general code I used to create the clusters.

from dask.distributed import Client, LocalCluster
    cluster = LocalCluster(n_workers = 2, memory_limit = "7.5GB")
    client = Client(cluster)

I used cluster.scheduler to extract the IP address needed with --p-client-address. In this steps, I used the first chunk of code changing --p-n-workers and --p-block-size by the --p-client-address.

I have tried different number of workers (2, 3 and 4) and different memory limits according to this number of workers. With this memory limits I have tried to solve the problems reported in the log file on the previous message. Here, there is an example of the Scheduler created for 2 workers.

All attempts were killed after running during 20-25 min. The vast majority of errors reported were related with "unmanaged memory" and/or "memory not released back to the OS" or similar. In two of this processes, my laptop screen freezed and I had to restart it.

3º Trimming memory: The previous errors led me to this section of the Dask website, where they suggest some potential solutions for this problems. In summary, I have tried the options "Manually trim your memory" and "Automatically trim memory", but the same errors remained.

For now, I have not tried to trim the database by using 3 degenerates instead of 5. An alternative solution I though was to move all the artifacts needed to run sidle-reconstruction (and tree reconstruction) to a more powerful machine. I think that when I have all reconstructed files, the next steps will not be require as much memory.

Best,

Andrés

jwdebelius · April 25, 2023, 6:10pm

Hi @andresarroyo,

I think this has something to do wtih system configuration and memory release, and may be a dask related issue and I'm not fully sure how to help you trouble shoot it.

I would run as individual commands, rather than the pipeline. I think your high memory command is simply reconstruct-database. If you run it in verbose mode, it will at least let you know where things are failing. It might help trouble shoot a solution other than going to a bigger memory computer. Otherwise, that may be a good option.

Best,
Justine