I am currently trying to build a tree with the RAxML plugin following the Q2 phylogeny: Community tutorial raxml-rapid-bootstrap script below only changing the input and output file paths. I have around 9,000 ASV from 246 samples of 16S (V4) paired end sequences. This has been running for about 3 days now so I wanted to check that my script was correct, specifically regarding the seed numbers which I copied from the tutorial. Are these just random numbers or will the number you input effect the tree building/run time? Also I know run time is very variable based on the dataset but I was wondering if the are any approximate run time for a dataset of this size?
Yes these are simply random numbers, the seeds will affect the tree structure of the starting parsimony tree. However, different seeds may or may not result in different maximum likelihood trees in the end. The seeds have no affect on run time and simply allow the building to be reproducible as outlined in the community tutorial, and help documentation of q2-phylogeny. More discussion of this topic can be found this great RAxML forum thread and the RAxML-III paper.
About run time. We will be providing access to additional CPU-optimized versions of RAxML in the next :qiime2: (2018.8) release. This should greatly reduce the RAxML tree building run times. See this other RAxML thread about why it can be difficult to access run times.
Another avenue to look into, check under the section "How many Threads shall I use?" in the RAxML manual. It discusses why using more cores, may not necessarily be faster. In fact, using to many cores/threads may slow down the search! I recommend using the --verbose option and perform a short test runs to see how many "distinct alignment patterns" you have. In brief, the rule-of-thumb is to use 1 core per 500 DNA site patterns, but the number of threads is also affected by the model being used (e.g. GTRGAMMA). Again, these are rules-of-thumb. So, perhaps test using a different number of threads to see if you get a speed improvement. If not, hopefully the optimized versions of RAxML may help you here.
Glad I could help! Keep us posted. I'll likely update the q2-phylogeny community tutorial shortly after the 2018.8 release. Keep an out for changes there.