Q2-phylogeny additions


(Mike Robeson) #1

Hi all,

With much help from @gregcaporaso, @thermokarst, and @ebolyen, during the code sprint I’ve been able to make progress adding RAxML into my fork of the q2-phylogeny plugin here.

Check it out in all of its glory. :wink:


I’ve run a few data sets via the plugin, and everything appears sane. I would greatly like any feedback on this. Especially, if there are some coding guidelines I’ve not properly followed. Ideas for other unit tests are welcome too. If this looks good, I’ll start on implementing IQ-TREE into the q2-phylogeny plugin like we discussed. In which case I’ll submit a pull request to add the IQ-TREE dependency.

I plan to add the bootstraping capability at a later date. I’d rather wait until I am confident all the basics down for RAxML and IQ-TREE. I figured returning a single tree would be sufficient for now…


(Greg Caporaso) #2

@SoilRotifer, this is looking great!

A couple of quick comments, and we’ll be happy to try it out and do a more thorough code review when the PR is submitted. First, you should probably import run_command from q2_phylogeny._fasttree rather than re-define it (it’s ok to import this private function within this package, just not from other packages). Also, is seed a random number generator seed? If so, I don’t think a default value should be defined here, though it’s a good idea to leave it as a parameter so you can use it in the tests, etc.

Thanks for the work on this! I’ll be great to get raxml and IQ-TREE in QIIME 2!

(Mike Robeson) #3

@gregcaporaso thanks!

Good point about the random seed. I set --p-seed as a required parameter, and committed the changes.

I slightly modified the run_command from _fasttree.py for use in _raxml.py as RAxML does not use standard input / output like FastTree. Unless I am mis-understanding what that bit of code is actually doing.


(Greg Caporaso) #4

Does raxml require the user to provide this? Ideally, this will not be a parameter that the user ever has to interact with, unless they want to (i.e., it should default to choosing it’s own random seed, but the user can provide one if they need to for some reason).

Oh, I see. You can leave that as-is then, I thought it was exactly the same.

(Mike Robeson) #5

Sorry, I forgot to mention why I am adding this as a parameter. Yes, RAxML requires that the user set either a seed, or a starting tree (which I’ve not implemented here).

However, that is a good idea. I’ll see if I can implement that the plugin chooses a seed if not provided.

(Mike Robeson) #6

@gregcaporaso I’ve been reading through both the QIIME 2 development and skbio docs. I am unable to figure out how to have the plugin pick a random integer each time RAxML is run through the plugin. It appears that it will only change unless I run qiime dev refresh-cache. As I have it now, it will show a default value as chosen by randint for --p-seed, e.g. [38741]. However, this number does not appear to change once set, unless I refresh the cache.

I can only assume I am incorrectly placing the location of the call to randint within the plugin. That is, I had been setting:
seed: int=randint(10000,100000) on this line here

Any suggestions?

(Matthew Ryan Dillon) #7

Hey @SoilRotifer, maybe another approach would be to not worry about dynamically creating that seed value within Q2, and instead just make it a required parameter (which forces the user to provide this as input). Looking at the RaXML docs, it looks like that is consistent with their expectations of the user. Do you think that would work?

(Mike Robeson) #8

Thanks @thermokarst. I was thinking along the same lines. But it is nice to hear you confirm that! :slight_smile:

I’ll likely submit a pull request for this plugin shortly.

(Greg Caporaso) #9

I agree with this plan now that I see this is a requirement of raxml itself - sorry for the confusion!

(Evan Bolyen) #10

Another easy option would be to set the seed to None and randomly draw one inside the method when it is None. This is usually how you would approach this in Python.

(Mike Robeson) #11

Hmm… I think I’ve tried this and it did not work as expected.

So, from what I understand, I’d set this line in plugin_setup.py as such:

        'seed': Int % None, 

And update this line in _raxml.py to:

      seed: int=None,

Then somewhere in the raxml method, I’d check if the seed is None and call randint(1000,10000).

Does this sound right? I think I tried to do this yesterday, but I could have entirely messed up due to depleted caffeine reserves. :slight_smile:


(Evan Bolyen) #12

Yep, although don’t use Int % None, just use Int. None is special-cased, so “optional” parameters are implicitly “default” parameters with a default of None.

(Mike Robeson) #13

Awesome sauce! I got it working. Though it wound up being an embarrassingly simple fix! I’ve committed the fixes. Thanks @ebolyen!

(fouad) #14

Can you describe what you mean by “linearized combined_seqs.fna”? For example, is that a FASTA file of demultiplexed sequences, multiplexed sequences, representative sequences, or something else?

(Matthew Ryan Dillon) #15

Hi @fouad2000 - where exactly are you seeing that phrase (“linearized combined_seqs.fna””)?