qurro Axes values always negative ?

Hi,

I have recently tried qurro and would like to ask guidance about interpreting its output.

Basically I was trying out qurro to choose biologically relevant features from my data. Since I work on longitudinal data, I followed the gemelli CTF workflow to generate the ordination plots, as elaborated here (gemelli/ipynb/tutorials/IBD-Tutorial-QIIME2-CLI.md at master · biocore/gemelli · GitHub).

Following that, I feed the 'subject_biplot.qza' as the ranks file to the qiime qurro loading plot method, which if I understand right visualizes the features ranked based on its position in the ordination space (eg the features' Axis1, 2, 3 coordinates on the biplot).

From what I've seen in examples ideally one should select features in the extreme positive/negative ends to make log-ratios of. However with my data I always see that the features have negative Axes values such that I don't see a clear positive/negative division (shown in picture). Does it mean something that my data look like this in the qurro loading plot, e.g., does it mean it is not variable enough? Can I arbitrarily set a cut-off anyway to get log ratios of (as I did in the picture) or is this not an appropriate approach?

My other question is can I only do differential ranking based on PCA loadings? Would it be less appropriate if I instead do the differential ranking from eg log2 FoldChange values ?

Pretty new to the whole qurro interface; please feel free to correct me if anything I said was wrong!

Thank you so much!

1 Like

Hi @ange,

My apologies for the delayed response. This is a strange issue: I'm not very familiar with Gemelli, but I think there are some positive (or, at least, negative-but-very-close-to-zero) values in your dataset, based on the fact that the x-axis of this plot goes up to at least 993 features (given this screenshot). However, I'd guess that the magnitudes of these positive values just happen to be relatively small, which makes them difficult to see in this plot. You should be able to zoom in on this plot in Qurro by scrolling up using the mouse, which might make the situation clearer. (Increasing the "bar width" slider may also help; if all else fails, I think you could also run qiime metadata tabulate on the Gemelli ordination and look at the top/bottom values for this axis to see what's up.)

Does it mean something that my data look like this in the qurro loading plot, e.g., does it mean it is not variable enough?

I think this sort of conclusion is only dependent on the tool that generated the loadings, so I'm shamelessly tagging @cmartino to see if he knows what's going on :slight_smile: I'd be willing to bet that this property relates to how the Gemelli biplot looks: likely there are a lot of arrows pointing in the "negative" direction along your PC 1 axis. However, I will defer to you on the question of if this has some biological relevance to your dataset!

Can I arbitrarily set a cut-off anyway to get log ratios of (as I did in the picture) or is this not an appropriate approach?

This will depend on your goals somewhat. I assume the main goal at this point is constructing a log-ratio to separate samples along this axis in the Gemelli ordination.

If so: even if all of your features have negative loadings in this axis (which, as discussed above, I'm not sure about), I suspect that -- if the numerator of your log-ratio is a small group of features on the leftmost side of the plot, as is shown in your screenshot -- a smaller group of features on the rightmost side of the plot would be more useful as a denominator than the currently-used denominator in this plot, which seems to cover all of the features to the right of the numerator group. The rationale for this is that often these sorts of ordinations can be explained by log-ratios using a small subset of features (Martino et al. 2019 writes in the context of RPCA that "These feature loadings can be largely explained by a few features").

So, long story short, the current approach you show in the screenshot is justifiable -- but I suspect it could be improved, giving you a sparser log-ratio.

My other question is can I only do differential ranking based on PCA loadings? Would it be less appropriate if I instead do the differential ranking from eg log2 FoldChange values ?

I think this should be ok; Qurro has mainly been used in the context of 1) loadings and 2) "differentials." which are essentially the same as log fold changes. (For reference: the paper introducing the differential ranking methodology (Morton/Marotz et al. 2019) defines "differential" as "the logarithm of the fold change in abundance of a taxa between two conditions.")

Of course, some people will probably say that certain ways of computing fold-changes make suboptimal assumptions / are inconsistent with other tools / etc etc etc., so I would advise being wary about that.

4 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.