On a similar note to my post about expanding possible FeatureTable input types to
q2-sample-classifier, I wonder if it’s appropriate to do something similar for some of the functions in
Disclaimer: I am personally really only interested in editing the merge function so that people can concatenate tables that they have individually percentile-normalized to then do downstream meta-analyses, but if we can improve the rest of the plugin while we’re at it why not!
Specifically, here are the functions where I think we can have more types of allowable inputs:
subsample: this seems appropriate, since the function subsamples features or samples (rather than counts within the table). The unit in the feature table should not matter here. I propose allowing Frequency, RelativeFrequency, PresenceAbsence, and PercentileNormalized types.
group: is there a reason this currently only allows for Frequency tables? Grouping by summing probably isn’t appropriate for some data types, but I do think that grouping should be allowed on more than just Frequency feature tables.
I propose allowing Frequency, RelativeFrequency, PresenceAbsence, and PercentileNormalized types here, with some modified aggregator functions. Is there a way to restrict the aggregation based on the input type? i.e. if the input table is RelativeFrequency, then you should only use a sum, mean, or median aggregator (rather than a ceiling’ed one). If the input table is a PresenceAbsence, then you should probably only use a presence/absence aggregator (or sum, I guess, if you want to know how many features per group are present in your table - though this would then need to be a different FeatureTable data type boo). For PercentileNormalized data, I think it would only make sense to do a median or mean aggregator.
merge: here again, any reason for only allowing Frequency tables? I think it should make sense to allow for merging other tables - though, again, summing may not be appropriate for non-Frequency or RelativeFrequency tables.
I propose allowing Frequency, RelativeFrequency, PresenceAbsence, and PercentileNormalized types here, though we should think about what overlap methods are allowed. (tbh, I’m having trouble thinking of example situations where you would expect and allow for overlap…) For PercentileNormalized data, you would always want to throw an error with sample overlap. For RelativeFrequency, summing is probably okay (?). PresenceAbsence should probably just be allowed to be re-converted into PresenceAbsence in the case of overlap, but might allow for summing?