Choosing a seed number when using RAxML for tree building

Hi @jjankowiak,

Let me see if I can help.

Yes these are simply random numbers, the seeds will affect the tree structure of the starting parsimony tree. However, different seeds may or may not result in different maximum likelihood trees in the end. The seeds have no affect on run time and simply allow the :evergreen_tree: building to be reproducible as outlined in the community tutorial, and help documentation of q2-phylogeny. More discussion of this topic can be found this great RAxML forum thread and the RAxML-III paper.

About run time. We will be providing access to additional CPU-optimized versions of RAxML in the next :qiime2: (2018.8) release. This should greatly reduce the RAxML tree building run times. See this other RAxML thread about why it can be difficult to access run times.

Another avenue to look into, check under the section "How many Threads shall I use?" in the RAxML manual. It discusses why using more cores, may not necessarily be faster. In fact, using to many cores/threads may slow down the search! I recommend using the --verbose option and perform a short test runs to see how many "distinct alignment patterns" you have. In brief, the rule-of-thumb is to use 1 core per 500 DNA site patterns, but the number of threads is also affected by the model being used (e.g. GTRGAMMA). Again, these are rules-of-thumb. So, perhaps test using a different number of threads to see if you get a speed improvement. If not, hopefully the optimized versions of RAxML may help you here.

-I hope this helps!
-Mike

2 Likes