Gradient-clustering or not using gneiss

Hi! I am just starting to work with ANCOM and gneiss to get some differential abundance data. I have a complicated dataset containing the 16S gene from the V1-V3 region and the V4 region. Within each region I have DNA and RNA. I want to compare everything to see what taxa are over/under-represented in V1-V3 compared to V4 and also between DNA and RNA. My question is what exactly should I do to find out the taxa that are different? I have tried gneiss with gradient clustering by region (I made the data numerical for it). I am not sure if this is appropriate though, or if I should stick to correlational-clustering?

I am also a little confused as to what I should be doing for the ols regression formula. So far I’ve done --p-formula “Treatment+Sample_Day+DNA_or_RNA”

I don’t think the sample day has any impact but, I put it in just in case.
Sorry if this post is too generic, I am just learning all of this and trying to teach myself about balances etc.
Thank you all for your continued help!

2 Likes

Hi @bmb22,

I don't think a correlation clustering makes sense for this example. You essentially have a categorical variable iwth 4 groups (or even two categorical variables with 2 groups): V1-3 DNA, V1-3 RNA, V4 DNA, V4 RNA. There isn't any intresnic ordering to the catgories that I can see which would suggest converting them to a number and doing gradient clustering. So, I would stick to correlation.

Id also recommend just checking a PCoA if you haven't already to see the grouping of the taxa in terms of region and nucleic acid. This can also help you see if, for instance, your treatment effect might be bigger or sample day has a big role. Or, if you have a weird subject-specific issue you might need to address somewhere.

Not generic at all! Its totally a fair question.

Best,
Justine

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.