DEICODE - Qurro: Feature loading sOTUs

fedarko · January 15, 2020, 4:44am

Thanks for the kind words!

I would like to build figures like Figure 5 in the DEICODE paper . Is there a way to do it using Qurro? (following a previous post )

Fig. 5 was generated using code in a separate GitHub repository -- I think the code that actually generates these figures is this one, but I didn't write this code (or the DEICODE paper) so I can't say for sure how easy it'd be to adapt this to other datasets.

I'll get to how you can replicate parts of Fig. 5 in Qurro later on below.

Also, could you please help me understand how I could build Figures 5C and 5D (from the same paper) manually, using ordination.qza? I am having difficulties understanding the relationship between the rank plot, Axis (1, 2, and 3), and sample plot.

Sure! Let's take it from the top.

Axes

These refer to the axes (aka PCs) you see in a DEICODE biplot, or really in any sort of PCA visualization. In a biplot, each feature and sample has a loading for each axis. Qurro mostly cares about feature loadings -- what is shown in rank plots of DEICODE output are the feature loadings for a given axis.

To get a sense of what axes "mean", I highly recommend looking at a visualization of a DEICODE biplot (e.g. in Emperor) while looking at Qurro visualizations of DEICODE feature loadings. This'll show you how samples/features are positioned relative to these axes.

Rank plots

The top plots shown in Figs. 5C and 5D (the bar plots) describe the Axis 1 feature loadings of features in the DEICODE biplot, sorted in descending order.^1,2 These bar plots are also referred to as rank plots. Each³ "feature" (ASV, sOTU, etc.) in a dataset is represented as a bar in this plot -- the Axis 1 loading for each feature is shown on the y-axis.

These feature loading values are really just numbers -- you can create these rank plots from them, like Qurro does and like fig. 5 does, to provide a quick indication of how features in a dataset correspond to some sort of variation. @cmartino would be able to explain the math details better than me, but long story short you can think of the feature loading values as the indications of how features "contribute" to variation along a given axis in the biplot. (For details about interpreting these loadings / interpreting compositional biplots in general, I'd recommend checking out section 4 of Aitchison and Greenacre (2002).)

One more note about rank plots I should mention -- although figs. 5C and 5D's rank plots are only of the Axis 1 feature loadings for those biplots, you could totally switch these to Axis 2, Axis 3, etc. to show features' loadings for those axes instead. This is doable in Qurro using the "Feature Loading" selection box below the rank plot. (Trying this could be useful if, for example, a sample group you're interested in was separated along Axis 2 instead of Axis 1 in the biplot -- see Fig. 2 of this preprint for an example in practice.)

Sample plots

This is a term I made up for Qurro (sorry to introduce more words into the literature...). Basically, all this is is a scatterplot/boxplot of samples in a dataset: one axis is the log-ratio of features for each sample, and the other axis is some other variable for each⁴ sample. The reason this is shown alongside the rank plot is that, hopefully, the rank plot should serve as a guide for what features to try looking at log-ratios of -- and the sample plot should indicate how these log-ratios vary across different samples.

In Qurro's sample plots the y-axis corresponds to some selected log-ratio, and the x-axis can be set to any sample metadata field of interest. Fig. 5C/5D's plots are a bit different from how Qurro organizes sample plots -- here, the x-axis (not the y-axis!) indicates the log-ratio values, and the y-axis are just the sample loadings for Axis 1 of the DEICODE biplot.

Replicating parts of Fig. 5 in Qurro

Replicating the rank plots

You should be able to create figures like the rank plots in Figs. 5C and 5D using Qurro, but of course they will look slightly different (e.g. features will be ranked in ascending instead of descending order by their loadings; colors will be different). When you select features for a log-ratio in Qurro they'll be highlighted on the rank plot, analogous to how Synechococcus, Cereibacter, etc. are highlighted in Figs. 5C/5D (although you'd need to add in the fancy arrow / text box saying Synechococcus (g) yourself, at least for now).

I should note that support for selecting multiple "groups" of features at once like as shown in the 5C/5D rank plots is still kind of messy in Qurro -- you can try something like filtering on features that contain the separated text fragments and then putting down something like Synechococcus, Cereibacter to replicate the blue features highlighted in the left rank plot in fig. 5C. However, this will include all features containing these text fragments, not just the highest or lowest ranked features with this text.

Replicating the "sample plots"

For creating figures like the plots shown below the rank plots (indicating correlation between PC1 of the sample loadings and certain log-ratios), this isn't currently doable in Qurro since Qurro doesn't do anything with biplots' sample loadings yet. I have an open issue to add support for generating these kind of figures in the sample plot eventually, but due to other obligations I probably won't be able to get around to that for some time. In the meantime you may be able to use the code in the aforementioned DEICODE repo to do this.

Hope this all helps make things clearer! Please let us know if you have any other questions, and thanks for trying out these tools.

Footnotes

¹ It's probably worth noting that earlier versions of DEICODE had some bugs relating to which axis was labelled which -- see here for details. This shouldn't impact interpretation/results much, but I would suggest making sure you're using the latest version of DEICODE.

² In this paper and in Qurro, we sort these plots instead in ascending order -- but this choice doesn't make much of a difference aside from flipping the plot horizontally. You're still "ranking" the features

³ In practice, some features might get filtered out for some reason.

⁴ In practice, some samples often end up being filtered out of these kinds of plots, due to e.g. having a count of 0 for one side of a log-ratio (log of 0 is undefined, as is log(x/0)). Qurro tries to be explicit about this happening.