I have gone through the Qiime2 tutorials, and while I am still learning the ins-and-outs of Qiime2, I wanted to share my experience with gneiss and perhaps ask if I am using it properly. I have two metadata variables that I know to be correlated with microbiome composition (i.e., their abundance positively and negatively correlate with several groups of OTUs/genera).
To be clear, since I haven't yet used the Qiime2 pipeline to preprocess my raw data, I am making my own .qza files from a biom table, taxa data, and metadata (see below). Then I am reproducing the gneiss tutorial on the 88soils samples but with this data. The outputs I get do not look quite right, and so my question is more about the process/application of this technique for non-Qiime2 generated data.
The data in the hdf5.biom file were pre-normalized with DESeq("poscounts") to account for discrepancies in library size, and I did this such that all of the counts are > 0 (numbers range from 0.6663828 to 14.6254360).
I then ran the following Qiime2/gneiss commands to reproduce the Qiime portion of the results:
qiime feature-table filter-features --i-table hdf5.biom.qza --o-filtered-table filt.biom.qza --p-min-frequency 3 qiime gneiss gradient-clustering --i-table filt.biom.qza --m-gradient-file sample.metadata.txt --m-gradient-category variable_1 --o-clustering tree.nwk.qza --p-weighted qiime gneiss dendrogram-heatmap --i-table filt.biom.qza --i-tree tree.nwk.qza --m-metadata-file sample.metadata.txt --m-metadata-category "variable_1" --o-visualization "heatmap" --p-ndim 10 --verbose qiime gneiss ilr-transform --i-table filt.pseudo.biom.qza --i-tree tree.nwk.qza --o-balances balances.qza qiime gneiss lme-regression --p-formula "variable_1" --i-table lme_balances.qza --i-tree tree.nwk.qza --m-metadata-file sample.AGE.txt --o-visualization model --p-groups "sample" qiime gneiss ols-regression --p-formula "variable_1 + variable_2" --i-table ols_balances.qza --i-tree tree.nwk.qza --m-metadata-file sample.metadata.txt --o-visualization regression_summary.qzv qiime gneiss balance-taxonomy --i-table filt.biom.qza --i-tree tree.nwk.qza --i-taxonomy taxa.qza --p-taxa-level 2 --p-balance-name 'y0' --m-metadata-file sample.metadata.txt --m-metadata-category variable_1 --o-visualization y0_taxa_summary.qzv
So I am able to generate balances, heatmaps, and perform regression okay, but the heatmap (and gradient prediction that I get), always has a large gap in it. The final file (after running the python portion of the tutorial) is here:
My question is, am I not filtering out the OTUs properly? Is the clustering method that I am using incorrect? And finally, is there a way to account for more than one metadata variable in a method like this (i.e., generating the predictions?). Please let me know if there is anything that I can provide that I may have forgot to include--I think these tutorials are helpful and have given me some good ideas, I am just concerned that I am not using the method properly! Thanks so much.