understanding feature volatility

Xio_Lee · November 23, 2019, 1:34am

I am trying to wrap my head around the plot produced by feature volatilty plot. For example: how is the importance value calculated? Is it a representation of the feature relative abundance? When ranking the important feature, does it mean that at each time point, the most important feature would have the most relative abundance change? And the global mean is relative abundance at each time point?

Thanks,
Xiao

Nicholas_Bokulich · November 25, 2019, 5:22pm

Hi @Xio_Lee,
Please check out the q2-longitudinal and q2-sample-classifier tutorials, which cover these details. See this section:
https://docs.qiime2.org/2019.10/tutorials/longitudinal/#feature-volatility-analysis

The feature importances in this context mean the features with the strongest temporal signal. This could mean that they gradually change with time or are strongly predictive of a single timepoint. It is not linked to relative abundance or count, though it could mean a higher or lower abundance at a particular timepoint.

Yes, relative abundance of that feature at all timepoints in all samples.

Good luck!

Xio_Lee · November 25, 2019, 9:30pm

Thank you for the reply.

So in my understanding, the feature importance is another word to say correlation with time? The more important feature, the strong correlation it would have with time?

If we are interested in the treatment effect with time on the microbial community, net average change or global mean would be a better metric to look at?

Thank you

Nicholas_Bokulich · November 25, 2019, 10:25pm

Not at all. The supervised regressors used here will be much more sensitive to non-linear relationships between time and feature abundance. You can have a very important feature (i.e., it predicts which time point a sample came from) but that feature's abundance would have no correlation with time, and the volatility plots produced by this action should give abundant examples of that scenario.

For more info check out the tutorial I linked to above and the wikipedia page on random forests (which is the default supervised regression method used):

Net change and global mean are important metrics for selecting features that you think are important, but neither is going to be useful on its own. These are provided to contextualize the feature importance scores. E.g., a feature may be very important but have very low mean relative abundance so you might not be as interested in that feature if you only care about abundant features.

system · December 27, 2019, 4:25am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.