For reasons , I have decided to make an alignment tree from the Silva 138.1 sequences, based on work by @BenKaehler. (I havet he notebook somewhere, please dont ask me where). Im working on an HPC, but keep running into a SegKill 9 error. I'm trying to guess at parameters that will let me run this. (I dont care as much about wall time, the sooner I can get the job started with reasonable parameters, the sooner it will finish.)
That's a good point about file size, @SoilRotifer thanks!
I downloaded the sequences with RESCRIPt, and then dereplicated the full length sequences. I masked using the qiime2 default parameters. Should I be masking more aggressively or switching to Ben's code?
Not sure why you'd be getting that error. Unless the alignment is not very good. Which may be the case if they were aligned using mafft or something. I think it'd be best to download the full curated secondary structure based alignment from SILVA and filter it based on your curated unaligned sequence file, as in Ben's notebook.
My thought is that the substantial amount of gappy / ambiguous columns are so messy using a de novo alignment of SSU sequences. This can make it very difficult for most phylogeny programs to search tree-space. Which can eat up a lot of memory and compute time. When possible, I typically try to use a curated secondary structure based alignment if available, like SILVA for my de novo tree making.