Gneiss - not creating partitions

Dear all, I am trying to run Gneiss following the available tutorial. It works just fine for a subset of my data - 18 samples containing 162 different features after some filtering. (I have removed features containing less than 5 sequences, and those found in only 1 sample. After creating partitions with Gneiss I get 15 - y0-y14. It all looks great, and I want to do the same for my main dataset.

So running the exact same commands on a larger dataset - 72 samples, 903 features I end up having 903 y's. And now I am wondering if anyone can help me understand what is happening and what is not happening :slight_smile: I have run the same samples without filtering features post-dada2, and the same problem is seen.

Solveig

Hi @stangedal. I'm having a little bit of trouble parsing your question. Could you post your code example with some screenshots of your results? Right now the clustering is a bit brute force -- if you have D features, gneiss will calculate D-1 balances with a given a tree.

Indeed :slight_smile:

My featuretable(frequency) contains 72 samples with 1030 features and 3.090.961 sequences.

qiime gneiss add-pseudocount --i-table Utv3_ASV_table.qza --p-pseudocount 1 --o-composition-table Utv3_GneissComp.qza

qiime gneiss correlation-clustering --i-table Utv3_GneissComp.qza --o-clustering Utv3_hierarchy.qza

qiime gneiss ilr-transform --i-table Utv3_GneissComp.qza --i-tree Utv3_hierarchy.qza --o-balances Utv3_balances.qza

And it runs along just fine.

Then I run the regression analyses:

qiime gneiss ols-regression --p-formula "disease_state+sex_0_male+smokingstatus_0_ex" --i-table Utv3_balances.qza --i-tree Utv3_hierarchy.qza --m-metadata-file Utv3mapping_only3.tsv --o-visualization Utv3_regressionsumGneiss.qzv

qiime tools view Utv3_regressionsumGneiss.qzv

And I get to see this:

Zooming in at the end it looks like this:

And now I need some help to understand what is happening here as I can not make much sense of this... Thank you for your quick reply :slight_smile:
Solveig

@stangedal, I'm still confused about your question. What specifically are you confused about? Did the tutorials make sense?

In my other attempt running these commands I used a dataset consisting of only 18 samples, 162 features and 766977 sequences. I have run the exact same commands, just using another qza file as input. The regression summary file gives the following:

What suprised me was that it is so many y in the larger dataset, making the regression coefficients summary impossible to read/see. While my smaller dataset leaves me with y0-y14. I thought something went wrong in the process when I saw the 903 y's in my larger dataset... I guess my question is - does the regression coefficients summary in the first example look right to you?

I typically don't rely on looking at the heatmap for sanity checking. First I'd recommend taking a look at the R^2 values and the MSE cross-validation values to sanity check how good the fit is, and if there is any overfitting.

Once you have established that you are getting reasonable fits, then you can start looking at the heatmap in the regression summary to start tweezing out which balances could be interesting.

The outline of these steps can be found here. Does that help guide how to start looking at these plots?

OK - that is actually helpful - but I will need some time to get familiar with the analyses! Thank you so much again for your help!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.