Choosing a seed number when using RAxML for tree building

jjankowiak · August 27, 2018, 3:15pm

Hi Qiime community,

I am currently trying to build a tree with the RAxML plugin following the Q2 phylogeny: Community tutorial raxml-rapid-bootstrap script below only changing the input and output file paths. I have around 9,000 ASV from 246 samples of 16S (V4) paired end sequences. This has been running for about 3 days now so I wanted to check that my script was correct, specifically regarding the seed numbers which I copied from the tutorial. Are these just random numbers or will the number you input effect the tree building/run time? Also I know run time is very variable based on the dataset but I was wondering if the are any approximate run time for a dataset of this size?

qiime phylogeny raxml-rapid-bootstrap --p-seed 1723 --p-rapid-bootstrap-seed 9384 --p-bootstrap-replicates 200 --p-substitution-model GTRGAMMAI --p-n-threads 4 --i-alignment /home/qiime2/Desktop/qiime2_BAC_samples_16S/phylogeny_ASV/RAxML/masked-aligned-rep-seqs.qza --o-tree /home/qiime2/Desktop/qiime2_BAC_samples_16S/phylogeny_ASV/RAxML/raxml.tree.bootstrap.qza

Any help would be much appreciated.

SoilRotifer · August 28, 2018, 4:28pm

Hi @jjankowiak,

Let me see if I can help.

Yes these are simply random numbers, the seeds will affect the tree structure of the starting parsimony tree. However, different seeds may or may not result in different maximum likelihood trees in the end. The seeds have no affect on run time and simply allow the building to be reproducible as outlined in the community tutorial, and help documentation of q2-phylogeny. More discussion of this topic can be found this great RAxML forum thread and the RAxML-III paper.

About run time. We will be providing access to additional CPU-optimized versions of RAxML in the next :qiime2: (2018.8) release. This should greatly reduce the RAxML tree building run times. See this other RAxML thread about why it can be difficult to access run times.

Another avenue to look into, check under the section "How many Threads shall I use?" in the RAxML manual. It discusses why using more cores, may not necessarily be faster. In fact, using to many cores/threads may slow down the search! I recommend using the --verbose option and perform a short test runs to see how many "distinct alignment patterns" you have. In brief, the rule-of-thumb is to use 1 core per 500 DNA site patterns, but the number of threads is also affected by the model being used (e.g. GTRGAMMA). Again, these are rules-of-thumb. So, perhaps test using a different number of threads to see if you get a speed improvement. If not, hopefully the optimized versions of RAxML may help you here.

-I hope this helps!
-Mike

jjankowiak · August 28, 2018, 7:39pm

Hi Mike,

Thank you, this was very helpful! I think I will wait to try this with the new CPU-optimized versions in the next release.

Thanks,

Jennifer

SoilRotifer · August 28, 2018, 8:31pm

Hi Jennifer,

Glad I could help! Keep us posted. I'll likely update the q2-phylogeny community tutorial shortly after the 2018.8 release. Keep an out for changes there.

-Cheers!
-Mike

system · September 29, 2018, 2:32am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.