Building a large fast tree tree without tripping Segkill 9

Hi Friends,

For reasons :tm:, I have decided to make an alignment tree from the Silva 138.1 sequences, based on work by @BenKaehler. (I havet he notebook somewhere, please dont ask me where). Im working on an HPC, but keep running into a SegKill 9 error. I'm trying to guess at parameters that will let me run this. (I dont care as much about wall time, the sooner I can get the job started with reasonable parameters, the sooner it will finish.)

Any recommendations would be really appreciated!

Best,
Justine

1 Like

Hi @jwdebelius,

Did you try removing gappy / poor aligned regions? It'd help to remove some of these as the SILVA alignment has 50k columns! :scream:

Ben's notes and code are here. :world_map:

2 Likes

That's a good point about file size, @SoilRotifer thanks!

I downloaded the sequences with RESCRIPt, and then dereplicated the full length sequences. I masked using the qiime2 default parameters. Should I be masking more aggressively or switching to Ben's code?

Best,
Justine

Ahh okay....

Not sure why you'd be getting that error. Unless the alignment is not very good. Which may be the case if they were aligned using mafft or something. I think it'd be best to download the full curated secondary structure based alignment from SILVA and filter it based on your curated unaligned sequence file, as in Ben's notebook.

My thought is that the substantial amount of gappy / ambiguous columns are so messy using a de novo alignment of SSU sequences. This can make it very difficult for most phylogeny programs to search tree-space. Which can eat up a lot of memory and compute time. When possible, I typically try to use a curated secondary structure based alignment if available, like SILVA for my de novo tree making.

Perhaps give that a try and see how it goes?

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.