NINJA-OPS Plugin for QIIME2

I would like to open up discussion for the development of the NINJA-OPS Plugin for OTU picking. Last year, there were no semantic types yet for FASTA format DNA sequencing reads so we created our own sequence artifacts. After looking through the dev-pages, if need a linearized combined_seqs.fna, will the following semantic type work out of the box?

from q2_types.feature_data import (DNAFASTAFormat)

I am using the following QIIME setup. Note that the plugin is registed for ninja-ops!

System versions
Python version: 3.5.3
QIIME 2 release: 2017.5
QIIME 2 version: 2017.5.0
q2cli version: 2017.5.0

Installed plugins
alignment 2017.5.0
composition 2017.5.0
dada2 2017.5.0
deblur 2017.5.0
demux 2017.5.0
diversity 2017.5.0
emperor 2017.5.0
feature-classifier 2017.5.0
feature-table 2017.5.0
ninja-ops 0+untagged.9.g0154917.dirty
phylogeny 2017.5.0
quality-filter 2017.5.0
taxa 2017.5.0
types 2017.5.0

The repository for the plugin is available here.

2 Likes

Hi @bhillmann! Can you describe what you mean by ā€œlinearized combined_seqs.fnaā€? For example, is that a FASTA file of demultiplexed sequences, multiplexed sequences, representative sequences, or something else?

Hi @jairideout @gabe the linearized combined_seqs.fna is simply a FASTA version of the quality controlled query sequences. We mean linearized as 1 header and 1 sequence line per record in the FASTA. The combined sequence FASTA file will contain the demultiplexed sequences as output similar in format to those output by the script. That is, instead of having 1 FASTA per sample, they are all combined with the samplename_recordindex in the header. This is in the input artifact required by NINJA-OPS that we placed into the original alpha plugin.

Sorry for the delay in responding to you @bhillmann! Thanks for explaining the file format ā€“ we have an open issue to support this format (i.e. the ā€œQIIME 1 demuxā€ format) and will likely include this support in the upcoming 2017.8 release (scheduled for around two weeks from now). So once thatā€™s available in q2-types youā€™ll be able to start using it in q2-ninja-ops!

Regarding q2-ninja-ops, Iā€™ll follow up with you directly in the next few days so we can work on getting this plugin into either the 2017.8 or 2017.9 release. Weā€™re specifically interested in having q2-ninja-ops perform closed-reference OTU picking as we donā€™t have closed-reference support yet in QIIME 2. Thanks!

1 Like

Hi all, big milestone ā€“ NINJA-OPS is now installable via conda!

Whatā€™s left is to make a wrapper for db building, and wrap it into the q2 plugin framework. Thanks @bhillmann for getting the conda pieces in place!

Gabe

2 Likes

This is great news, thanks for keeping this moving forward @bhillmann and @gabe!

We have added some testing stubs for NINJA-OPS in the develop branch. They all pass on my side. I think we might need to add some more, such as induced failures. Let me know what you think @jairideout

@bhillmann, Iā€™m going to help out with reviewing this instead of @jairideout. I should be able to take a look in the next couple of days. Iā€™m really excited to see this moving forward!

1 Like

Hi @bhillmann,
Iā€™m just getting started on working with this to do a review. Overall this is looking really good so far! Itā€™ll be nice to finally get this released!

I was able to install q2-ninja-ops in my QIIME 2 2018.2 environment, but so far I havenā€™t been able to install ninja-ops itself through conda. What channels are you relying on for this? Iā€™m working on macOS, and Iā€™m getting the following error:

$ conda install -c knights-lab ninja_ops
Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  - ninja_ops
  - bowtie2 

If youā€™re not sure whatā€™s going on with conda there, I can help out, but just didnā€™t want to start poking around if you could just give me a command that would work.

My first recommendation is that you expand the tests a little bit. At the moment, youā€™re testing that the actions run successfully, but there is no sanity test that the results look reasonable. When weā€™re wrapping other applications, we generally assume that the underlying application is sufficiently tested, and we just put some very minimal tests in place to ensure that the plugin isnā€™t mangling the results of the underlying application at all, and that any parameters that the user passes make their way into the underlying application. The q2-vsearch tests are a good example of this - you can see the closed-reference tests here, and you can feel free to use any/all of the data from those tests.

A couple of other high-level recommendations:

  • I recommend changing the name cluster-closed-reference to cluster-sequences-closed-reference. Take a look at this tutorial - the cluster sequences would imply that itā€™s taking sequence data (not a feature table) as input. This change isnā€™t a requirement, but I think it would help users understand how the method fits into QIIME 2.
  • You should add descriptions for all of the input/output/parameters in your two methods. See here for an example of how to do that. See here for an example of what that documentation will look like.

I can give some more feedback once I get ninja-ops installed with conda - again, just let me know if you need some help with that.

2 Likes

We are currently using our knights-lab conda channel for ninja_ops and the bioconda channel for bowtie2. The channel is showing a recipe uploaded for MacOS. Not sure why it isn't working for you, maybe it is the bioconda channel dependency. There are no downloads on the website of the MacOS build.

Link to recipe:

Link to conda channel:
https://anaconda.org/knights-lab/ninja_ops/files

I can refactor to cluster-sequences-closed-reference. Adding more sanity checks incoming. We added a little bit of test data and set up what seems to be the paradigm for the unit testing framework and passing around files.

@bhillmann, I have some time this week to take another pass. Are you refactoring based on some of my previous suggestions? Ideally I could take another pass after that.

One other thing I noticed was that when running closed-reference clustering, there is a db.fna file being left behind. QIIME methods should only output .qza files - could you have that file either be cleaned up or written to a qza instead?

I havenā€™t had a chance to refactor yet. Iā€™ll run through those refactorings and let you know when it is ready. Were able to get it running?

Iā€™ll take a look for that db.fna, it should be cleaned up.

@bhillmann, Iā€™ll hold off for more review then until youā€™ve have a chance to refactor. Does that work? I didnā€™t get the conda installation working, so I just installed ninja-ops directly. We can help with that on my end when youā€™re ready for me to review again.

@bhillmann, just wanted to check in on how things are going. Iā€™m ready to do a re-review when youā€™re ready for it.

Just a little update:

We managed to get the whole shebang conda installable, but we stalled out on the Mac version not working and we donā€™t know why. Ben (our python/Conda expert) has moved on to a new position, and if the bug is trivial enough (something about environments), everything should be good to go.

Appears to be working well on Linux too!

Cheerio,
Gabe