Hello, I'm running a QIIME2 differential abundance analysis on the results of a 16S metagenomics study. I tried to stick to this tutorial.
However, I think I need some clarifications about how it exactly works, because the results obtained by running the plugin gneiss correlation-clustering confused me.
I used, as suggested in the tutorial, the plugins in the following order:
I used a hierarchical logic, and decided to calculate regression according to three different variables (Replicate, Soil and Site) specified in my metadata file.
Then, I finally built a heatmap sorted by Site, in which the ratios y0 to y9 were shown.
qiime gneiss correlation-clustering --i-table table-no-clo-mit.qza --o-clustering gneiss/hierarchy.qza
qiime gneiss ilr-hierarchical --i-table table-no-clo-mit.qza --i-tree gneiss/hierarchy.qza --o-balances gneiss/balances.qza
qiime gneiss ols-regression --p-formula "Replicate+Site+Soil" --i-table gneiss/balances.qza --i-tree gneiss/hierarchy.qza --m-metadata-file 16S_metadata.tsv --o-visualization gneiss/regression_summary.qzv
qiime gneiss dendrogram-heatmap --i-table table-no-clo-mit.qza --i-tree gneiss/hierarchy.qza --m-metadata-file 16S_metadata.tsv --m-metadata-column Site --p-color-map seismic --o-visualization gneiss/site_heatmap.qzv
This is what the heatmap looks like:
And finally decided, according to the results of the heatmap, to use the balance 'y0' to sum up my results, because it apparently covers most of my diversity....
And here come my issues. I wanted to make comparisons for the variable 'Site' and I used...
--p-taxa-level 5
,which is FAMILY level...
1 stands for Kingdom, 2 for Phylum, 3 for Class, 4 for Order, 5 for Family, 6 for Genus and 7 for Species, don't they?
qiime gneiss balance-taxonomy --i-table table-no-clo-mit.qza --i-tree gneiss/hierarchy.qza --i-taxonomy taxonomy26.qza --p-taxa-level 5 --p-balance-name 'y0' --m-metadata-file 16S_metadata.tsv --m-metadata-column Site --o-visualization gneiss/y0_family_site_summary_test_forforum.qzv
And here is what two graphs look like...
- First of all, since I specified level -5, why are genera shown here (with the exception of Bradyrhizobiacee?) This is the result I expected for level -6!
I can fix it easily by choosing the upper level (4) for families, but I'd like to understand what's behind this... - Why are there some redundant taxa in the Proportion plot? Just look at how many times g__DA101 is repeated, both in under and overexpressed parts of the graph... What does stand that for? Did I make any mistakes in my previous analysis? Apparently, Balance Taxonomy is alright and shows no redundance.
Thank you for the clarifications, maybe I missed something about the logic of this approach!