SEPP Plugin on WSL - Returning non-zero exit and enter a zombie state

Lisou · November 11, 2024, 9:09pm

Hello!

I’m experiencing issues while running the fragment-insertion sepp plugin on Qiime2 within a Windows Subsystem for Linux (WSL 2) environment. The SEPP process repeatedly generates the error 'non-zero exit status 1' and when I try run with deblur, the zombie processes (Z state), appear to cause the process to stall indefinitely. I’ve tried several troubleshooting steps but haven’t resolved the issue, so any insights would be greatly appreciated!

System Details

OS: Windows Subsystem for Linux (WSL 2), Ubuntu 20.04
Qiime2 Version: 2024.10
Memory & Swap: 16GB memory, 12GB swap
Processor: 20 cores allocated in WSL

Command

qiime2-amplicon-2024.10) lisou@DESKTOP-OO5NL0A:~/Nested$ qiime fragment-insertion sepp \
--i-representati> --i-representative-sequences Nested-rep-seqs.qza \
> --i-reference-database sepp-refs-silva-128.qza \
> --p-threads 4 \
> --o-tree Nested-merged-sepp-tree.qza \
> --o-placements Nested-merged-sepp-placements.qza \
Plugin error from fragment-insertion:

  Command '['run-sepp.sh', '/tmp/qiime2/lisou/data/35e66cf8-05f4-462a-b349-4325ba34a5ef/data/dna-sequences.fasta', 'q2-fragment-insertion', '-x', '8', '-A', '1000', '-P', '5000', '-a', '/tmp/qiime2/lisou/data/e44b5e78-31e5-4a0f-9041-494bc3ca2df2/data/aligned-dna-sequences.fasta', '-t', '/tmp/qiime2/lisou/data/e44b5e78-31e5-4a0f-9041-494bc3ca2df2/data/tree.nwk', '-r', '/tmp/qiime2/lisou/data/e44b5e78-31e5-4a0f-9041-494bc3ca2df2/data/raxml-info.txt']' returned non-zero exit status 1.

Debug info has been saved to /tmp/qiime2-q2cli-err-2swu9_nc.log

Log info:

Removing /tmp/tmp.0gkiL3LLUi/sepp-tmp-RZvEqxZmcq
Traceback (most recent call last):
File "/home/lisou/anaconda3/envs/qiime2-amplicon-2024.10/lib/python3.10/site-packages/q2cli/commands.py", line 530, in call
results = self._execute_action(
File "/home/lisou/anaconda3/envs/qiime2-amplicon-2024.10/lib/python3.10/site-packages/q2cli/commands.py", line 602, in _execute_action
results = action(**arguments)
File "", line 2, in sepp
File "/home/lisou/anaconda3/envs/qiime2-amplicon-2024.10/lib/python3.10/site-packages/qiime2/sdk/action.py", line 299, in bound_callable
outputs = self.callable_executor(
File "/home/lisou/anaconda3/envs/qiime2-amplicon-2024.10/lib/python3.10/site-packages/qiime2/sdk/action.py", line 570, in callable_executor
output_views = self._callable(**view_args)
File "/home/lisou/anaconda3/envs/qiime2-amplicon-2024.10/lib/python3.10/site-packages/q2_fragment_insertion/_insertion.py", line 75, in sepp
_run(str(representative_sequences.file.view(DNAFASTAFormat)),
File "/home/lisou/anaconda3/envs/qiime2-amplicon-2024.10/lib/python3.10/site-packages/q2_fragment_insertion/_insertion.py", line 54, in _run
subprocess.run(cmd, check=True, cwd=cwd)
File "/home/lisou/anaconda3/envs/qiime2-amplicon-2024.10/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['run-sepp.sh', '/tmp/qiime2/lisou/data/35e66cf8-05f4-462a-b349-4325ba34a5ef/data/dna-sequences.fasta', 'q2-fragment-insertion', '-x', '4', '-A', '1000', '-P', '5000', '-a', '/tmp/qiime2/lisou/data/e44b5e78-31e5-4a0f-9041-494bc3ca2df2/data/aligned-dna-sequences.fasta', '-t', '/tmp/qiime2/lisou/data/e44b5e78-31e5-4a0f-9041-494bc3ca2df2/data/tree.nwk', '-r', '/tmp/qiime2/lisou/data/e44b5e78-31e5-4a0f-9041-494bc3ca2df2/data/raxml-info.txt']' returned non-zero exit status 1.

I've tried reducing the thread set --p-threads to 2 (from 8 originally), thinking this might reduce memory pressure or process conflicts. This change did not resolve the issue.
I've also checked and increased the memory and CPU: There’s plenty of available memory and swap, so I don’t believe the issue is due to resource exhaustion.
I am also unable to run debug, as QIIME stalls.

Command with debug

qiime fragment-insertion sepp \
  --i-representative-sequences Nested-rep-seqs.qza \
  --i-reference-database sepp-refs-silva-128.qza \
  --p-threads 2 \
  --o-tree Nested-merged-sepp-tree.qza \
  --o-placements Nested-merged-sepp-placements.qza \
  --verbose \
  --p-debug

When I run top shows that some of the run_sepp.py processes enter a zombie state after a few minutes. The command then stalls indefinitely, and CPU usage drops to zero, indicating that SEPP is no longer progressing. Here’s what I observed in top:

Tasks:  13 total,   1 running,  10 sleeping,   0 stopped,   2 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  26060.9 total,  16683.2 free,   6164.4 used,   3213.3 buff/cache
MiB Swap:  12288.0 total,  12288.0 free,      0.0 used.  19536.0 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
    1 root      20   0    2296   1324   1108 S   0.0   0.0   0:00.18 init
    8 root      20   0    2164    360      0 S   0.0   0.0   0:00.00 init
    9 root      20   0    2172    360      0 S   0.0   0.0   0:00.00 init
   10 lisou     20   0   10172   5108   3276 S   0.0   0.0   0:00.04 bash
  496 root      20   0    2644    564      0 S   0.0   0.0   0:00.00 init
  497 root      20   0    2644    564      0 S   0.0   0.0   0:00.05 init
  498 lisou     20   0   10172   5176   3404 S   0.0   0.0   0:00.01 bash
  517 lisou     20   0   10872   3696   3144 R   0.0   0.0   0:00.32 top                      
 635 lisou     20   0 5456316   2.4g 133432 S   0.0   9.5   0:30.78 qiime
  664 lisou     20   0    8624   3224   2948 S   0.0   0.0   0:00.00 run-sepp.sh
  668 lisou     20   0 3945268   3.6g  11572 S   0.0  14.0   2:11.24 run_sepp.py
  670 lisou     20   0       0      0      0 Z   0.0   0.0   0:01.33 run_sepp.py
  671 lisou     20   0       0      0      0 Z   0.0   0.0   0:01.39 run_sepp.py

Any help would be appreciated. I have looked at the other posts on the non-zero exit status 1 for WSL environments and the only solution I've seen is switching to OS. And I'm stumped being unable to run deblur!

gregcaporaso · November 12, 2024, 4:54pm

Hi @Lisou, Thanks for reporting this. Honestly, not a lot of us have experience with WSL. I'm going to request help from the other moderators, but just wanted to give you a heads-up that it might take a few more days before we're able to get back to you. We hope to be able to help!

Lisou · November 13, 2024, 1:58am

Hi Greg,

Thanks for letting me know.
I managed to get the debug working, but the error message returning non-zero exits is consistent.

Debug report attached.
debug-SEPP.txt (716.9 KB)

gregcaporaso · November 13, 2024, 5:01pm

Thanks for sharing this @Lisou! From this error log, it looks like pplacer is failing with exit code 11. pplacer is a tool that is used internally by SEPP. There was some discussion of this issue in this thread that might be helpful to you for sorting this out. I hope this helps! Let us know if you have any additional questions.

Lisou · November 20, 2024, 3:03am

I've finally fixed this issue. As I suspected it was not a matter of allocating more memory. From the thread linked above, I determined that the issue was the vsyscall. To address this, I needed to change two things:

Add in the .wslconfig file the following:

[wsl2]
#in addition to the memory and swap allocated
kernelCommandLine = vsyscall=emulate

Then, in \wsl.localhost\Ubuntu-20.04\proc\config.gz

change
CONFIG_LEGACY_VSYSCALL_NONE=n
to
CONFIG_LEGACY_VSYSCALL_NONE=y

This solved my problem. As I understand SEPP failed due to improper handling of vsyscall within the WSL environment. The legacy vsyscall mechanism was either disabled or not set to emulate, which caused certain system calls required by SEPP to fail, leading to a non-zero exit error.

Hope this helps anyone else having a similar issue with QIIME whilst using a WSL environment.

SoilRotifer · November 20, 2024, 4:24pm

Thank you for sharing this @Lisou!

system · December 22, 2024, 1:26am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.