I would like to open up discussion for the development of the NINJA-OPS Plugin for OTU picking. Last year, there were no semantic types yet for FASTA format DNA sequencing reads so we created our own sequence artifacts. After looking through the dev-pages, if need a linearized combined_seqs.fna, will the following semantic type work out of the box?
from q2_types.feature_data import (DNAFASTAFormat)
I am using the following QIIME setup. Note that the plugin is registed for ninja-ops!
Hi @bhillmann! Can you describe what you mean by ālinearized combined_seqs.fnaā? For example, is that a FASTA file of demultiplexed sequences, multiplexed sequences, representative sequences, or something else?
Hi @jairideout@gabe the linearized combined_seqs.fna is simply a FASTA version of the quality controlled query sequences. We mean linearized as 1 header and 1 sequence line per record in the FASTA. The combined sequence FASTA file will contain the demultiplexed sequences as output similar in format to those output by the script. That is, instead of having 1 FASTA per sample, they are all combined with the samplename_recordindex in the header. This is in the input artifact required by NINJA-OPS that we placed into the original alpha plugin.
Sorry for the delay in responding to you @bhillmann! Thanks for explaining the file format ā we have an open issue to support this format (i.e. the āQIIME 1 demuxā format) and will likely include this support in the upcoming 2017.8 release (scheduled for around two weeks from now). So once thatās available in q2-types youāll be able to start using it in q2-ninja-ops!
Regarding q2-ninja-ops, Iāll follow up with you directly in the next few days so we can work on getting this plugin into either the 2017.8 or 2017.9 release. Weāre specifically interested in having q2-ninja-ops perform closed-reference OTU picking as we donāt have closed-reference support yet in QIIME 2. Thanks!
We have added some testing stubs for NINJA-OPS in the develop branch. They all pass on my side. I think we might need to add some more, such as induced failures. Let me know what you think @jairideout
@bhillmann, Iām going to help out with reviewing this instead of @jairideout. I should be able to take a look in the next couple of days. Iām really excited to see this moving forward!
Hi @bhillmann,
Iām just getting started on working with this to do a review. Overall this is looking really good so far! Itāll be nice to finally get this released!
I was able to install q2-ninja-ops in my QIIME 2 2018.2 environment, but so far I havenāt been able to install ninja-ops itself through conda. What channels are you relying on for this? Iām working on macOS, and Iām getting the following error:
$ conda install -c knights-lab ninja_ops
Solving environment: failed
PackagesNotFoundError: The following packages are not available from current channels:
- ninja_ops
- bowtie2
If youāre not sure whatās going on with conda there, I can help out, but just didnāt want to start poking around if you could just give me a command that would work.
My first recommendation is that you expand the tests a little bit. At the moment, youāre testing that the actions run successfully, but there is no sanity test that the results look reasonable. When weāre wrapping other applications, we generally assume that the underlying application is sufficiently tested, and we just put some very minimal tests in place to ensure that the plugin isnāt mangling the results of the underlying application at all, and that any parameters that the user passes make their way into the underlying application. The q2-vsearch tests are a good example of this - you can see the closed-reference tests here, and you can feel free to use any/all of the data from those tests.
A couple of other high-level recommendations:
I recommend changing the name cluster-closed-reference to cluster-sequences-closed-reference. Take a look at this tutorial - the cluster sequences would imply that itās taking sequence data (not a feature table) as input. This change isnāt a requirement, but I think it would help users understand how the method fits into QIIME 2.
You should add descriptions for all of the input/output/parameters in your two methods. See here for an example of how to do that. See here for an example of what that documentation will look like.
I can give some more feedback once I get ninja-ops installed with conda - again, just let me know if you need some help with that.
We are currently using our knights-lab conda channel for ninja_ops and the bioconda channel for bowtie2. The channel is showing a recipe uploaded for MacOS. Not sure why it isn't working for you, maybe it is the bioconda channel dependency. There are no downloads on the website of the MacOS build.
I can refactor to cluster-sequences-closed-reference. Adding more sanity checks incoming. We added a little bit of test data and set up what seems to be the paradigm for the unit testing framework and passing around files.
@bhillmann, I have some time this week to take another pass. Are you refactoring based on some of my previous suggestions? Ideally I could take another pass after that.
One other thing I noticed was that when running closed-reference clustering, there is a db.fna file being left behind. QIIME methods should only output .qza files - could you have that file either be cleaned up or written to a qza instead?
@bhillmann, Iāll hold off for more review then until youāve have a chance to refactor. Does that work? I didnāt get the conda installation working, so I just installed ninja-ops directly. We can help with that on my end when youāre ready for me to review again.
We managed to get the whole shebang conda installable, but we stalled out on the Mac version not working and we donāt know why. Ben (our python/Conda expert) has moved on to a new position, and if the bug is trivial enough (something about environments), everything should be good to go.