I am trying to wrap my head around the plot produced by feature volatilty plot. For example: how is the importance value calculated? Is it a representation of the feature relative abundance? When ranking the important feature, does it mean that at each time point, the most important feature would have the most relative abundance change? And the global mean is relative abundance at each time point?
The feature importances in this context mean the features with the strongest temporal signal. This could mean that they gradually change with time or are strongly predictive of a single timepoint. It is not linked to relative abundance or count, though it could mean a higher or lower abundance at a particular timepoint.
Yes, relative abundance of that feature at all timepoints in all samples.
Not at all. The supervised regressors used here will be much more sensitive to non-linear relationships between time and feature abundance. You can have a very important feature (i.e., it predicts which time point a sample came from) but that feature’s abundance would have no correlation with time, and the volatility plots produced by this action should give abundant examples of that scenario.
For more info check out the tutorial I linked to above and the wikipedia page on random forests (which is the default supervised regression method used):
Net change and global mean are important metrics for selecting features that you think are important, but neither is going to be useful on its own. These are provided to contextualize the feature importance scores. E.g., a feature may be very important but have very low mean relative abundance so you might not be as interested in that feature if you only care about abundant features.