2 Questions about the output of the
classify samples scripts
feature_importance.qza artifact produced after running classify samples consists of two columns: the ASV (feature) and “importance”.
Could someone point me in the direction of what this means (from the docs)?:
Importance of each input feature to model accuracy.
I have a sense that this is providing me with an indication of which ASVs are more important at discriminating between the group(s) when the model is being built, but I was hoping to have a better understanding of how any such value is actually derived.
#2. Does anyone have a sense of how importance abundance information is? I’ve been playing with both rarefied and non rarefied data and it seems like:
a. Rarefied data has more ASVs with larger Importance values (per ASV) … and as a result …
b. There are fewer ASVs to provide, say, something like 50% of the overall Importance
In the example plots below, there are 4 different groups that I was investigating (the horizontal facets); the same data were analyzed using either rarefied data or unrarefied data. Same samples, same ASVs. What’s curious to me is how many more ASVs are part of the outcome in the unrarefied data; I’m struggling to grasp the meaning behind why so few ASVs provide a high level of discrimination among rarefied data yet not in unrarefied data… except… well apparently for the last factor (“batch”).
Thanks for the tips!