Additional FeatureTable data types for q2-feature-table functions

On a similar note to my post about expanding possible FeatureTable input types to q2-sample-classifier, I wonder if it’s appropriate to do something similar for some of the functions in q2-feature-table.

Disclaimer: I am personally really only interested in editing the merge function so that people can concatenate tables that they have individually percentile-normalized to then do downstream meta-analyses, but if we can improve the rest of the plugin while we’re at it why not!

Specifically, here are the functions where I think we can have more types of allowable inputs:

subsample: this seems appropriate, since the function subsamples features or samples (rather than counts within the table). The unit in the feature table should not matter here. I propose allowing Frequency, RelativeFrequency, PresenceAbsence, and PercentileNormalized types.

group: is there a reason this currently only allows for Frequency tables? Grouping by summing probably isn’t appropriate for some data types, but I do think that grouping should be allowed on more than just Frequency feature tables.

I propose allowing Frequency, RelativeFrequency, PresenceAbsence, and PercentileNormalized types here, with some modified aggregator functions. Is there a way to restrict the aggregation based on the input type? i.e. if the input table is RelativeFrequency, then you should only use a sum, mean, or median aggregator (rather than a ceiling’ed one). If the input table is a PresenceAbsence, then you should probably only use a presence/absence aggregator (or sum, I guess, if you want to know how many features per group are present in your table - though this would then need to be a different FeatureTable data type boo). For PercentileNormalized data, I think it would only make sense to do a median or mean aggregator.

merge: here again, any reason for only allowing Frequency tables? I think it should make sense to allow for merging other tables - though, again, summing may not be appropriate for non-Frequency or RelativeFrequency tables.

I propose allowing Frequency, RelativeFrequency, PresenceAbsence, and PercentileNormalized types here, though we should think about what overlap methods are allowed. (tbh, I’m having trouble thinking of example situations where you would expect and allow for overlap…) For PercentileNormalized data, you would always want to throw an error with sample overlap. For RelativeFrequency, summing is probably okay (?). PresenceAbsence should probably just be allowed to be re-converted into PresenceAbsence in the case of overlap, but might allow for summing?


Ok wait update: I just realized I already wrote a way to handle multiple feature tables in the percentile normalization code - the user can first merge their multiple RelativeFrequency feature tables and then percentile normalize within each study directly with the q2-perc-norm plugin. But they do need to start with RelativeFrequency tables and not Frequency (since counts need to be turned into relative abundances within each study before merging tables).

So as far as I’m concerned, I’d be okay with just including RelativeFrequency feature tables in all these edits! That might make the aggregating questions easier to figure out…


Thanks @cduvallet! And I apologize for the very late response…

The issue with all of these is the same as we discussed re: q2-sample-classifier, that we need a TypeMap to define variable output types. I think you have summarized most of the long-standing, pressing reasons to implement a TypeMap for outputs!

So hopefully this can be done in the next release or so…

1 Like