As a follow up to an earlier post, I've been trying to determine how to best visualize the results from a few classify-samples-ncv
outputs. This experiment contains samples that were obtained from:
- 2 locations ("EN" and "HB")
- 3 dates (June, July, and September)
I want to show the readers a few things:
- How well does a given model perform? For example, is the classifier better at assigning samples to a collection date versus a location?
- What are the particularly important features when generating this model? Are these features the same across the different classifiers? For example, are the most important features the same in the classifier for Date as they are for Location?
I'm pretty clear on how to address the first objective. Where I'm stuck, and where I'd like some guidance/critique, is how to best answer the second one. More on this below...
I haven't yet used any of the available downstream tools classify-samples-ncv
produces, but it would be great to have any pointers on which tools apply to the three output files (feature_importance.qza
, predictions.qza
, and probabilities.qza
).
Instead, I've manually exported the output files, and fuddled around with them in R to produce a plot shown below. To achieve my two goals stated above:
-
Panels D-F show model performance by taking the
predictions.qza
file and generating a heatmap showing how often the prediction matched the actual group. The values inside each box represent the number of samples. Seems like these classifiers work really well at identifying when a sample was obtained (D), but aren't always perfect when it comes to classifying where a sample was obtained (E). When you consider both when and where (F), the classifier actually does better. I think/hope that is clear. -
Panels A-C show are more subjective in my mind. I started by exporting the
feature_importance.qza
file, then ordering and filtering by the relative importance: I gathered those features (OTUs in this case) that summed to 50%. In other words, the data shown in each of these panels represent those OTUs which account for half of the models 'importance' (however we define that). I wonder what users think about whether it make senses to use a common % approach, rather than taking a fixed integer of OTUs and explaining what proportion of "importance" they account for? And whether 50% makes any sense at all, versus say something more strict like 20% or more inclusive like 80%? What values to other use? Perhaps there are other techniques that are useful to help explain which Features are most important to a classifier.
Panels A-C are also an attempt to illustrate whether the same OTUs are important between classifiers. You can see that one group, the Trichoptera (teal color) are important for Panel B, but not panels A or C, but the Psocodea (orange color) are important for Panels A and C, but not B. This is where I think my current % filtering approach suffers. It's possible that "important for X, but not for Y or Z" is entirely a function of filtering at that 50% threshold. If I change that to 60%, the story can change. If I change it to 20%, of course it changes again. That's why I wonder if there is any other way to think about this kind of data.
Greatly appreciate your feedback and thoughts!