Hi @amonm82,
That is an awful lot of sequences! Generally, a de novo approach will use lots of memory for both alignment and searching tree-space. But you should be okay with 64 GB RAM.
Some questions:
- Are these ASVs? OTUs?
- Have the primer sequences been removed from the reads?
- This can affect phylogenetic reconstruction / topology.
- What quality control steps have been carried out?
- Sequence quality.
- Taxonomy removal (i.e. mitochondrial, chloropolast, etc..)
- When running the `align-to-tree-mafft-fasttree` pipeline did you set `--p-parttree` for alignment? This will help reduce memory usage and run time. See here.
The fragment insertion approach, for this many sequences, can also take a while. But can be more easily parallelized, than the de novo approach. That is, parallelization of de novo tree construction typically only gains benefits from longer sequences, and may not scale well beyond ~4 processors when dealing with ~ 436 bp.
As for which is “better”, I’d recommend reading the fragment insertion paper. Differences between the two approaches can vary by data set. For some data, I’ve noticed no differences, others some differences.