Should we convert Frequency FeatureTable to RelativeFrequency within a plugin?

cduvallet · January 3, 2019, 8:12pm

The percentile normalization function in the q2-perc-norm plugin uses relative abundances, and currently only accepts FeatureTable[RelativeFrequency] as an input. @cdiener pointed out that it would be nice to also allow FeatureTable[Frequency] as an input, so that it can use the output of denoising without having to convert to FeatureTable[RelativeFrequency] first.

I have one philosophical and one logistical question:

Does this make sense to incorporate into my plugin, or is this sort of redundancy not recommended? What I'd end up doing is checking whether the input is RelativeFrequency or Frequency, and if it's Frequency first convert to relative abundance before proceeding (e.g. with this function in the feature table plugin). This adds redundancy into qiime2, because the user could instead just use the q2-feature-table plugin to do this conversion, so it may not be ideal. On the other hand, it's much more convenient for the user to have this wrapped into the code without requiring an extra step. What do y'all think?

Logistically, how do I check whether the input is RelativeFrequency or Frequency and include an extra step before the percentile normalization if it's Frequency? One hacky way I can think of to do this is check whether each sample sums to 1, but it would be cleaner to directly interrogate the input type.

thermokarst · January 7, 2019, 9:18pm

Hey there @cduvallet!

I think this would make a lot of sense to incorporate, but at the same time, isn't strictly required.

Both points make sense to me --- I guess I see it as 6 of one and a half-dozen of the other.

I would take advantage of Pipeline --- you can type check the input table, and optionally run feature-table relative-frequency if the supplied input type is FeatureTable[RelativeFrequency]. The vsearch cluster-features-open-reference does a similar bit of branching in it, based on intermediate results. The thing about Pipeline is that the function arguments are QIIME 2 Artifacts, rather than view types (e.g. biom.Table here), so you can interrogate the type by calling my_data.type on it to learn more about it! A brief example:

39%20PM

There are some advantages to making this a pipeline - one-stop shop for your users; as well, provenance will capture the implicit relative-frequency command (if run), which is cool. A disadvantage is that this would introduce a new dependency for q2-perc-norm: q2-feature-table. Not really a problem, but, well, you never know...

Also worth thinking about - you could refactor percentile_normalize as a Pipeline, or, you could think about adding in a new Pipeline that calls percentile_normalize as part of its operation. The only "big" advantage I can think of there, is then you wouldn't need to refactor percentile_normalize at all, it would just work. Otherwise, you would need to view table as a biom table (in the case of refactoring to a Pipeline) - table = table.view(biom.Table) after ensuring that the table is relative freq.

Let us know if you need a hand with anything! :qiime2:

cduvallet · January 8, 2019, 1:33pm

Ah, thanks so much! As always, you have answered my question and then some!

I will likely refactor percentile normalize into a pipeline, since it's already kind of a mish mashing pipeline-y things in it. This will be an opportunity to do it properly!