I have specific question related to developing the percentile normalization plugin, but think it also opens a more general conversation about downstream compatibility considerations when developing plugins.
What are the consequences of defining new semantic types as outputs to custom plugins? For example, in our plugin the output data is a feature table with OTUs in samples converted to percentiles of their respective distribution in controls (more info in our preprint). So technically, the output should not be a FeatureTable[RelativeFrequency] and should instead be something like a FeatureTable[PercentileNormalized]. However, I’m wondering what the downstream implications of that are.
For example, we’ll want users to be able to use these feature tables in most downstream applications (e.g. differential abundance testing, PCA plotting, etc). Will creating a new output type require many changes to downstream functions to allow for compatibility? Are there any significant drawbacks to having this output be a generic FeatureTable[RelativeFrequency] even though that’s not technically what it contains?
The implication is that your output will be guarded from entering methods which do not understand that particular type. This is usually a really good thing.
It will, but only in the most trivial way. Something that might use FeatureTable[RelativeFrequency] would now accept FeatureTable[RelativeFrequency | PercentileAbundance] (if that made sense for that particular action).
This is essentially what we expect to happen over time. As new types emerge, it isn’t always certain that a given type makes sense for a given action (even if the underlying representation is compatible, say a fasta file or a biom table). So on one hand, we want to stop misuse of data, and on the other, we don’t want to make it so inconvenient to adapt to new techniques that the ecosystem starts to use overly-broad types (making the types meaningless). We hope that having to add something like “| PercentileAbundance” meets that middle-ground of specific, but simple to update.
You would lose the ability to discriminate between them. It depends on if that is important or not. There is also another trick we can use depending on how PercentileAbundance relates to RelativeFrequency.
If every instance of a PercentileAbundance can be considered a RelativeFrequency, but not every table that is RelativeFrequency can be used as PercentileAbundance (i.e. the set of tables that are PercentileAbundance is a strict subset of tables that are RelativeFrequency), then you can use this notation:
# any string/name is fine
FeatureTable[RelativeFrequency] % Properties("percentile_normalized")
Which would allow every method which accepts FeatureTable[RelativeFrequency] to use your output, while still being able to determine those which were not percentile normalized.
IMPORTANT: This feature is relatively unused at the moment, and the syntax is definitely subject to change. We haven’t identified a lot of situations where this is necessary, but I imagine it’s useful when you need to “tighten up” the ontology. Any feedback or discussion on this would be awesome!
The following is also possible, but it sounds like it’s really something that applies to the table as a whole rather than the observations themselves, so the above makes more sense than this:
After talking with my collaborator, I think we’ll go ahead and make a new type because only a few downstream applications would, in fact, be appropriate for percentile-normalized data.
That said, it seems like users who make new output data types will need to go through the code for all the available core functions and plugins and add “| PercentileNormalized” to each relevant function’s input, right? For now, this seems feasible - there aren’t that many functionalities or plugins (and like I said, in our particular case there are actually on a very small number of downstream things you should do with percentile-normalized data). As qiime2 grows, do you foresee this becoming an unreasonable burden?
I’m not a software engineer so maybe this is a simpler problem than it seems to me, but I was wondering what your plans are for these kinds of edits downstream.
That definitely remains to be seen, but updating an annotation is certainly simpler than updating functional code. Like you mentioned, it wouldn’t make sense for users to have done many of the other downstream methods. If QIIME 2 didn’t require someone to change something, then users would have to keep track of all of this (like in QIIME 1).
We’re hoping developers are in the best position to know what makes sense as input, and we hope that getting happy users, provenance tracking, and lots of different interfaces are compelling enough reasons to justify the cost of dealing with these “ontology shifts/extensions” once in a while. ¯\_(ツ)_/¯
This seems to work - when I install my plugin (python setup.py install), I don’t get any errors.
However, when I try to run the plugin on some test data, I get the following error:
claire:~/github/q2-perc-norm/test_data$ qiime perc-norm percentile-normalize --i-table test_otu_table.qza --m-metadata-file test_metadata.txt --m-metadata-column DiseaseState --o-perc-norm-table test_out.percentile_qiime.percnorm_format.qza
Plugin error from perc-norm:
Name 'PercentileNormalized' is not a defined QIIME type, a plugin may be needed to define it.
What else do I need to do in order to define this new type? Is there anything in base qiime2 that needs to be edited?
Are there other user-developed plugins that define new types that I could use as an example to go off of?
Figure out what functionality should work with the PercentileNormalized variant
Move the registration/declaration of PercentileNormalized to q2-types so that every plugin in step 1 can import the type without introducing any inter-plugin dependency chains or cycles. (This is pretty much the only reason q2-types exists, it’s just a shared vocabulary between the plugins that us it.)
Add FeatureTable[<whatever was there> | PercentileNormalized] to everything in step 1.
Finally coming back to this, hopefully will be able to make these changes before the next release!
Just to check that I do things in a way that doesn’t make your life more difficult: I should just go through each of the relevant q2 repos, fork them, make the type declaration edits, and do a PR? Is there a different/easier way than this?