I recently read your paper "Uncovering the Horseshoe Effect in Microbial Analyses" with great interest and think it can help me understand my data better. I observe horseshoe effect on my data set, but could not identify the gradient driving it. I will briefly describe the data and how I did the analysis.
Data: 16S amplicon data from human stools. There is a treatment group and a control group.
Analysis:
Applied PCoA on the relative abundance table and saw a strong horse effect.
Then, I did Hellinger transformation and applied PCoA, the pattern did not improve much .
Question:
How can I identify the gradient creating this horseshoe pattern?
In your paper, the 88-soil and the postmortem mouse examples are very straightforward, when you ordered the otu table using pH and days of decomposition, the band pattern appears nicely! However, in my case, when I sorted the table by treatment or other potential gradients (those I think may have an impact, e.g. different hospitals) there is no diagonal band pattern. Is there a way to sort the table to see if there is a diagonal band?
When I see this sort of pattern, the first thing that comes up to mind is to try to sort the table to reveal band / block patterns in the data.
In gneiss there are two approaches, namely gradient-clustering and correlation-clustering. In your case, it may be possible to make your treatment types numerical valued and apply gradient-clustering (although it is a bit hacky and will need careful validation). A more appropriate solution would be to apply correlation-clustering, which will perform hierarchical clustering and essential group/sort the features. From there, dendrogram-heatmap can help visualize the overall block pattern in the table.
Out of curiosity, did you rarefy your tables or try to colors your samples by sequencing depth? I’m not sure, but it’s possible that you have a nice sequencing depth gradient that you are seeing in your plot …
Thank you for the idea, I checked seq depth, and it is not the cause of horsehoe. The plots were colored by treatment groups. When I run the analysis on rarified table I got the same pattern. When I look closely to the data, I saw it’s because of the gradient in the dominating taxa.
Thank you @mortonjt! My data is extremely sparse, there are only 51 genera in the table and many samples were dominated by a few genera or even by a single genus. So many samples ended up with 100% single genus. For this dataset, I don’t think PCA/PCoA is a proper method. I’d like to ask in this kind of situation, is there a better method can be used for describing the data? What do you think about singular value decomposition?
Singular value decomposition (SVD) is actually the same thing as PCA (I know - the jargon is confusing).
Seeing horseshoes is not necessarily a bad thing. If your horseshoe is being caused by treatment effect, suggests that the effect size is so large that it warps your ordination.
But if you really really want to get rid of the horseshoe (i.e. look at smaller treatment effects), might be worthwhile to look into manifold learning methods such as tSNE. Note that we don’t support tSNE in qiime2 at the moment, so investigate with caution.
I agree with @mortonjt here and I’m not sure if worth trying to destroy the horse shoe effect or better try to interpret it; you might have a really nice tread, as you mention, due to some taxa and perhaps how long the treatment has been going on. This might be a good read.