Validation for parameters in plugin?

Hi

I am writing a QIIME2 plugin. One of the actions will allow to import data from a text file. The result is to be an archive for processing by another action.

Please correct me if I am wrong but I cannot specify the input text file as an input to the plugin because it is not an archive. Therefore, I have to set a parameter which will take the path to the text file being imported.

However, there is no validation code that can automatically be called on parameters by the QIIME2 framework is there? If so, how would I do that? At present, I am just writing the validation code in the registered function and I have not defined a Format for the type of text file that I am importing.

I am somewhat confused by this and it appears not to be covered fully in the documentation! Some assistance would be greatly appreciated.

With many thanks

Hi @mroper,
You're correct that you can't specify a text file as input to the plugin. This should instead be handled through importing the file. While the solution you suggest would technically work on the command line, this would fail in the other interface types that QIIME 2 supports - notably Galaxy or other GUIs. In that case, the user would likely be presented with a text field, and there would be no mechanism for the file to be uploaded from the path they typed into the text field (and many users might not know what to type into the text field).

We don't have great developer documentation on this - we're in the very early stages of re-writing our developer documentation (planning, right now). However, I had a situation that may be similar for a plugin I recently developed. I'll link you to specific lines to show how I handled this.

I have a function that is effectively performing a file transformation - this could be like your archive that you're processing. I'm loading that in as stratified_table. (This is a table generated by metaphlan, and this function processes it to include just a single taxonomic level and generates a QIIME 2 feature table as output.)

I register this function as a QIME 2 action here. stratified_table is provided as type MetaphlanMergedAbundanceTable here.

MetaphlanMergedAbundanceTable is defined and then registered.

I define MetaphlanMergedAbundanceFormat as a format, which is used for importing my file. This format is a type of TextFileFormat. Note that you should define a basic validator for that file format - it doesn't have to be exhaustive, but it's useful if it gives an indication of whether the file type looks reasonable. (I check that the first five lines have the same number of columns.)

I then define a DirectoryFormat. This can be a little more complex, but if it's just a single file that you're working with you can just adapt what I have done. Otherwise let me know and I'll point you at some other resources.

Next, I register the two formats I defined, and then associate the format with the type. This defines what we call an "artifact class" - basically a type of artifact that can be used with QIIME 2.

Finally, I create and register a transformer, which takes my format as input and creates and returns a pandas DataFrame. This gets called when I call my action metaphlan_taxon, so I can provide my artifact as input and get a DataFrame inside my function.

You could now import and apply your action as illustrated here. This will automatically work with any QIIME 2 interface, including the Python 3 API, the command line interface, Galaxy, or any others. So it's definitely a bit of extra work relative to just providing a path as a parameter, but you've gained a ways of using it that should be comfortable to users with very difference experience levels (from clinicians to data scientists).

Let me know if you have an questions.

2 Likes

Thank you very much @gregcaporaso ! And for your quick reply !! I am able to get your proposed solution to work well in my case.

I have a followup question. When I import data into a QIIME2 archive in this way from a text file format, I would also like to store some provenance information. I had thought to create an expanded DirectoryFormat with a Readme file (TextFileFormat) as an additional data file. I run into 2 issues here when I seek to implement this.

Firstly, there is a design issue. The readme file is not really data but rather provenance information however AFAIK the provenance directory in the archive I am outputting is not accessible to me as a plugin developer and there is no allowance for "manually inputted" files such as this in the design of the provenance directory structure. Therefore, I seem to be forced to include this file in the data directory - or am I making a design mistake?

Secondly, there is a coding issue that I think exists regardless of whether or not the readme is stored in data/ 0r /provenance. The issue is very closely related to your reply post to my initial query in this discussion thread. It seems that the best option to bring the readme file into the QIIME2 framework is to first import the readme text file into an archive and then have the plugin action take as input 2 archives. However, this is seeming like quite a big effort to go to just to import a couple of text files !!! Am I making a mistake?

Thanks

Hi @mroper, I'm glad that you were able to get that working!

Integrating your own provenance on import is a great idea, but unfortunately not something that is supported at the moment. This is related to an issue we've had open on our issue tracker for a long time, though differs a bit since you're looking to do this on import, which should probably be more straightforward than what is being discussed in that issue (since it wouldn't require modifying or duplicating an existing artifact). I'm going to cross-reference this post on that issue.

What you could do, if you want to store that file in the data/ directory, would be to define a DirectoryFormat that is not a single file, and which includes the README file and the actual data file. You can see an example of defining one of these here. The README would likely only ever be accessible if a user exported the artifact, and it wouldn't be included in the provenance of downstream artifacts and visualizations, which kind of defeats the purpose, so I realize this solution is less than ideal.

I can bring this issue up with the other developers and see if we can get some traction on implementing it for an upcoming release. This would then be something that is generally accessible across all artifact types, which could be really handy (e.g., I could imagine using this for noting date of access, DOI, etc for reference files used for feature annotation).

Thanks @gregcaporaso !!

I have proceeded with the DirectoryFormat solution as you suggest. It's great to know that I am not barking up the wrong tree :slight_smile:

Thanks again!

2 Likes