generating feature loading plot in qurro using DEICODE biplot features

MarcelK · December 13, 2020, 2:45am

Hi there
Please I need your help in implementing qurro. I would ultimately like to generate feature loading plots that are possibly similar to the one in Fig.4B of this paper here coauthored by @mortonjt. I first would like to obtain the five percent top and bottom abundant features/taxa using the qurro visualization. I am following this tutorial which seems to imply that the direction of the arrows in the denominator and numerator matters and are indicative of the order in which the features or taxa are appearing. The tutorial just says to sort out the arrows in axis1 or axis2. Should both arrow be pointing in the same direction or opposing direction ? Is one comparing the first or second taxa in the top five percent in the numerator to the first or second taxa in the top five percent of the denominator or else? Second I would like to generate the feature loading plots. I am using the 8 features in the DEICODE biplot but not sure which of the has to be pasted in the numerator/denominator. I have searched the 8 features in the five percent top and bottom abundant taxa but in some cases the feature is absent, in other cases 7 of the 8 are in the numerator and when I paste them as indicated by their position the feature loading plot doesn't appear to be like the one in the paper indicated above. Just seems that fewer features were seleceted than expected. Thanks for helping.

mortonjt · December 18, 2020, 4:51am

Hi @MarcelK, it is a bit difficult to see what you are having trouble with. Would you like to paste the deicode plots and the qurro results? That would put us in a better position to answer your question.

MarcelK · December 18, 2020, 6:15am

Hi @mortonjt. Here is the biplot

. I would like to use the 8 features displayed to obtain a plot that looks like the one on figure 4B of the Baker et al. (2020) paper on which you are the second coauthor. The question is how do I tell which of the features go in the denominator and the numerator slot? How did you obtain that figure 4B ? Thanks.

mortonjt · December 18, 2020, 7:34pm

Hi @MarcelK, we're heading in the right direction, but we'll need a little bit more information.
What question are you trying to answer? Are you trying to get the top explanatory microbes?

You are typically looking for microbes that point in opposite directions (perpendicular to the decision boundary you are trying to carve out). It looks like there are too few arrows in this plot, so I'd definitely add more and chose the arrows pointing to the left and the ones pointing to the right.

I'm attaching a more clear cut example of how to interpret these biplots published in https://www.pnas.org/content/113/22/E3130.short Figure S4

In panel H, there two keys groups of microbes, the yellow arrows that are related to Helicobacter and microbes such as Mucispillum. These two microbes alone differentiate the samples between and after treatment.

It is also worthwhile mentioning that qurro was designed to help identify important microbes, and enable post-hoc hypothesis testing.

If you have trouble understanding how to interpret the biplots, check out this book
https://www.fbbva.es/microsite/multivariate-statistics/maed.html

MarcelK · December 19, 2020, 4:42am

Hi @mortonjt, this is great and I appreciate you pointing me to these useful resources. They partially will help me solve the issues I explained earlier. To answer your question I am comparing two groups of samples between two different states and the explanations you just provided are really useful for identifying the taxa of interest between the samples in those two states. One issue remaining is how to generate the feature loading plot for this same groups of samples. My understanding from the various resources is that one can use the features from the biplot to generate feature loading plots. I tried copying and pasting the features from the biplot into the qurro plot's numerator and denominator. I obtained a plot but was not convinced I had it right. My question is how do you decide which of the features from the biplot go into the denominator and which one into the numerator?

mortonjt · December 22, 2020, 12:37am

Got it, you are trying to figure out what combination of features to put into the log ratio?

Truth is, there isn't really a right answer to that -- there are on the order of O(2^d) possible reference frames for d microbes, you cannot enumerate the possibilities and get the "right combination". When @fedarko was designing qurro, the objective was to ease exploratory analysis of these ratios to help users chose meaningful ratios (by meaningful, I mean ratios that can well differentiate the taxa).

If you need some quantifiable metric, you can download the qurro sample data and apply your favorite statistical method. If you are determined to get the "best" answer, its worthwhile to check out greedy algorithms such as sebal : Balances: a New Perspective for Microbiome Analysis - PubMed

fedarko · December 22, 2020, 2:04am

@mortonjt summed it up pretty well -- I don't know of an always-perfect way to select a log-ratio for these sorts of analyses, although there are plenty of solid solutions like selbal, amalgam, etc. that won't require you to mess around with Qurro, biplots, etc.

...This does bring about the worrying realization that there isn't necessarily a single "correct" way to analyze a dataset. In theory someone could go through hundreds of different differential abundance tools, different tool parameters, different ways of selecting log-ratios in Qurro, etc. until they find the "one" setup that gives them the results they'd want to see. This is clearly a bad idea, since if they go that far then the odds are likely that the results seen will be noise at that point (...plus it violates the scientific method pretty hard, I'd think). An antidote to this, I think, is being very explicit about what they have done and why they have done it, and being willing to accept that there may not be anything interesting going on in the dataset (...which probably doesn't happen often enough in microbiome research, but that's a paragraph for another day).

In general, looking at features pointing in different directions in the biplot is useful, like @mortonjt mentioned above. "Autoselection" in Qurro can be useful for this if the samples you are attempting to separate are separated along a specific axis in the biplot (for example, gut samples vs. other samples in this tutorial), but this isn't always the case.

In the biplot you posted, are you trying to find a log-ratio that distinguishes the red-colored samples from the blue-colored samples? You could try autoselection along Axis 1 to do this, since these samples seem to be kind of separated along Axis 1, although I would hesitate to assign "significance" to the results because it looks like there are only three red-colored samples. (Also in general I'm not a huge fan of using p-values for the results of exploratory analyses, but that's a whole other holy war...)

If you're looking for further reading, I wrote a pretty in-depth comment about some of the details on selecting features from DEICODE in Qurro a few months back: it's a bit long, but the section starting with "To be sure, I would like to ask you" might be a good place to start.

Hope this helps clarify things!

MarcelK · December 22, 2020, 7:35pm

Thanks to both you guys for taking the time to explain all of these. I guess following all your comments my next stop is the autoselection approach and I am going to take a look at it. Thanks again.

system · January 23, 2021, 1:35am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.