(Benjamin Hillmann) #1

I would like to open up discussion for the development of the NINJA-OPS Plugin for OTU picking. Last year, there were no semantic types yet for FASTA format DNA sequencing reads so we created our own sequence artifacts. After looking through the dev-pages, if need a linearized combined_seqs.fna, will the following semantic type work out of the box?

from q2_types.feature_data import (DNAFASTAFormat)

I am using the following QIIME setup. Note that the plugin is registed for ninja-ops!

System versions
Python version: 3.5.3
QIIME 2 release: 2017.5
QIIME 2 version: 2017.5.0
q2cli version: 2017.5.0

Installed plugins
alignment 2017.5.0
composition 2017.5.0
dada2 2017.5.0
deblur 2017.5.0
demux 2017.5.0
diversity 2017.5.0
emperor 2017.5.0
feature-classifier 2017.5.0
feature-table 2017.5.0
ninja-ops 0+untagged.9.g0154917.dirty
phylogeny 2017.5.0
quality-filter 2017.5.0
taxa 2017.5.0
types 2017.5.0

The repository for the plugin is available here.

(Jai Ram Rideout) #2

Hi @bhillmann! Can you describe what you mean by “linearized combined_seqs.fna”? For example, is that a FASTA file of demultiplexed sequences, multiplexed sequences, representative sequences, or something else?

(Benjamin Hillmann) #3

Hi @jairideout @gabe the linearized combined_seqs.fna is simply a FASTA version of the quality controlled query sequences. We mean linearized as 1 header and 1 sequence line per record in the FASTA. The combined sequence FASTA file will contain the demultiplexed sequences as output similar in format to those output by the script. That is, instead of having 1 FASTA per sample, they are all combined with the samplename_recordindex in the header. This is in the input artifact required by NINJA-OPS that we placed into the original alpha plugin.

(Jai Ram Rideout) #5

Sorry for the delay in responding to you @bhillmann! Thanks for explaining the file format – we have an open issue to support this format (i.e. the “QIIME 1 demux” format) and will likely include this support in the upcoming 2017.8 release (scheduled for around two weeks from now). So once that’s available in q2-types you’ll be able to start using it in q2-ninja-ops!

Regarding q2-ninja-ops, I’ll follow up with you directly in the next few days so we can work on getting this plugin into either the 2017.8 or 2017.9 release. We’re specifically interested in having q2-ninja-ops perform closed-reference OTU picking as we don’t have closed-reference support yet in QIIME 2. Thanks!

(Gabe A) #7

Hi all, big milestone – NINJA-OPS is now installable via conda!

What’s left is to make a wrapper for db building, and wrap it into the q2 plugin framework. Thanks @bhillmann for getting the conda pieces in place!


(Jai Ram Rideout) #9

This is great news, thanks for keeping this moving forward @bhillmann and @gabe!

(Benjamin Hillmann) #11

We have added some testing stubs for NINJA-OPS in the develop branch. They all pass on my side. I think we might need to add some more, such as induced failures. Let me know what you think @jairideout

(Greg Caporaso) #15

@bhillmann, I’m going to help out with reviewing this instead of @jairideout. I should be able to take a look in the next couple of days. I’m really excited to see this moving forward!

(Greg Caporaso) #16

Hi @bhillmann,
I’m just getting started on working with this to do a review. Overall this is looking really good so far! It’ll be nice to finally get this released!

I was able to install q2-ninja-ops in my QIIME 2 2018.2 environment, but so far I haven’t been able to install ninja-ops itself through conda. What channels are you relying on for this? I’m working on macOS, and I’m getting the following error:

$ conda install -c knights-lab ninja_ops
Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  - ninja_ops
  - bowtie2 

If you’re not sure what’s going on with conda there, I can help out, but just didn’t want to start poking around if you could just give me a command that would work.

My first recommendation is that you expand the tests a little bit. At the moment, you’re testing that the actions run successfully, but there is no sanity test that the results look reasonable. When we’re wrapping other applications, we generally assume that the underlying application is sufficiently tested, and we just put some very minimal tests in place to ensure that the plugin isn’t mangling the results of the underlying application at all, and that any parameters that the user passes make their way into the underlying application. The q2-vsearch tests are a good example of this - you can see the closed-reference tests here, and you can feel free to use any/all of the data from those tests.

A couple of other high-level recommendations:

  • I recommend changing the name cluster-closed-reference to cluster-sequences-closed-reference. Take a look at this tutorial - the cluster sequences would imply that it’s taking sequence data (not a feature table) as input. This change isn’t a requirement, but I think it would help users understand how the method fits into QIIME 2.
  • You should add descriptions for all of the input/output/parameters in your two methods. See here for an example of how to do that. See here for an example of what that documentation will look like.

I can give some more feedback once I get ninja-ops installed with conda - again, just let me know if you need some help with that.

(Benjamin Hillmann) #18

We are currently using our knights-lab conda channel for ninja_ops and the bioconda channel for bowtie2. The channel is showing a recipe uploaded for MacOS. Not sure why it isn’t working for you, maybe it is the bioconda channel dependency. There are no downloads on the website of the MacOS build.

Link to recipe:

Link to conda channel:

I can refactor to cluster-sequences-closed-reference. Adding more sanity checks incoming. We added a little bit of test data and set up what seems to be the paradigm for the unit testing framework and passing around files.

(Greg Caporaso) #20

@bhillmann, I have some time this week to take another pass. Are you refactoring based on some of my previous suggestions? Ideally I could take another pass after that.

One other thing I noticed was that when running closed-reference clustering, there is a db.fna file being left behind. QIIME methods should only output .qza files - could you have that file either be cleaned up or written to a qza instead?

(Matthew Ryan Dillon) #21

(Benjamin Hillmann) #22

I haven’t had a chance to refactor yet. I’ll run through those refactorings and let you know when it is ready. Were able to get it running?

I’ll take a look for that db.fna, it should be cleaned up.

(Matthew Ryan Dillon) #23

(Greg Caporaso) #24

@bhillmann, I’ll hold off for more review then until you’ve have a chance to refactor. Does that work? I didn’t get the conda installation working, so I just installed ninja-ops directly. We can help with that on my end when you’re ready for me to review again.

(Greg Caporaso) #25

@bhillmann, just wanted to check in on how things are going. I’m ready to do a re-review when you’re ready for it.

(Matthew Ryan Dillon) #26

(Gabe A) #27

Just a little update:

We managed to get the whole shebang conda installable, but we stalled out on the Mac version not working and we don’t know why. Ben (our python/Conda expert) has moved on to a new position, and if the bug is trivial enough (something about environments), everything should be good to go.

Appears to be working well on Linux too!