Segmentation Fault with phylo-rpca

Hello! I am trying to run phylo-rpca-without-taxonomy on my data and am receiving a Segmentation Fault error. I am using a filtered feature table and associated insertion tree.

:gear: Environment details

Version: I was originally running this in Qiime2 version 2023.9, and am now running it in version 2024.5. I've installed gemelli using command pip install gemelli.
Hardware: I am running MacOS with an M1 chip, and have about 22GB of free memory as well as 535GB of disk space available.

:desktop_computer: Run details

The command I am running is

qiime gemelli phylogenetic-rpca-without-taxonomy \ --i-table filtered_table.qza \ --i-phylogeny insertionTree.qza \ --output-dir phylo_rpca_out

I have run this command successfully with another dataset in the past, but can't see a notable difference between my count tables or trees when I export and inspect the qiime artifacts.

The error I see is

zsh: segmentation fault qiime gemelli phylogenetic-rpca-without-taxonomy --i-table filtered_table.qza

In linux, I have also seen segmentation fault (core dump) in one trouble shooting attempt.

:dart: Troubleshoot attempts

In addition to changing my Qiime2 version, running the code in a linux system, and re-importing my qiime artifacts, I have tried:

Verbose run: I ran the same command as above with the --verbose flag and got the following result

RuntimeWarning: divide by zero encountered in log mat = np.log(matrix_closure(matrix_closure(mat) * branch_lengths))

zsh: segmentation fault qiime gemelli phylogenetic-rpca-without-taxonomy --i-table filtered_table.qza

Forum search: I searched the Qiime2 Forum for divide by 0 errors and found

But this was resolved with a taxonomy reformat, and I'm running without taxonomy. I also searched the Qiime2 forum for segmentation faults and found two posts suggesting a memory issue but, as noted above, I don't seem to have a memory shortage. If my Mac does have one, I assume it would have been resolved when running this command on my linux-based research cluster, which was not the case.

Check Rosetta use: I have also run this command within my MacOS terminal with and without the "Open using Rosetta" option selected from the app menu.

Any feedback or ideas on how to proceed from here would be appreciated. Let me know if I can provide more info :slightly_smiling_face:

1 Like

Hi @cmartino!
Do you have any advice for this user?
Thanks,
Hannah

Hi @AttilaTheBun, (and thanks for the tag @jphagen)

Thank you for reporting!

The divide by zero warning can be ignored here, that is not causing any issues.

I have only seen the segmentation fault on high-rank data where none of the samples are related to each other (Fig. 2 here example of what that looks like) which blows up the memory (the algorithm relies on low-rank input). Unfortunately, it is hard to detect and therefore hard to issue a more reasonable error message. It is also possible the phylogeny is inducing this and the non-redundancy/redundancy needs to be trimmed (using --p-min-feature-frequency or --p-min-depth example of what that filter does here).

Are you able to share the input table/tree? It can be a sample de-identified table and can also DM it privately to me, so I can try to replicate the error and dig into the reason. I understand if that is not possible.

Best,

Cameron

2 Likes

Thank you @cmartino for the response and @jphagen for moderating. This was helpful, I can indeed get results back by adjusting the min-depth parameter!

1 Like