Background:
I am following the gneiss tutorial, using the data and code attached below. I can run the tutorial with the tutorial data with no issues. So the issue is with my stuff.
The Problem:
My regression summary doesn’t show all of my patients. Specifically, it doesn’t show Patient AH. The No. Observations is correct (12), and the sample ID’s for AH appear in the Predicted Balances and Residuals .csv files, but AH is absent from the Coefficients and Coefficient values .csv files. AH does appear in the dendrogram-heatmap visualization.
What I have done:
I have checked my metadata file, and it doesn’t seem to show any errors. The SampleID’s in the metadata file match those in each of the feature tables. I have tried different values in the Patient column in the metadata file and I have tried different columns, with no success. I tried creating new metadata files, nope. I went back all the way to the original fastq files and confirmed the sample ID’s were correct. When using QIIME1’s biom summarize-table
on the exported composition and balances artifacts, the SampleID’s for AH are present in both. In addition, when I specified --p-formula "Group"
it only showed Group B. I also tried --p-formula "PatientGroup"
, both Patient AH and Group A were missing. Grasping at straws, I changed the 'A's to other letters, that wasn't the issue. I have done a number of other things, but none have worked.
Files:
Metadata file: metadata.tsv (1.1 KB)
Metadata visual: metadata.qzv (1.1 MB)
Unfiltered feature table artifact: table_unflt.qza (57.1 KB)
Unfiltered feature table visual: table_unflt.qzv (356.1 KB)
Filtered feature table artifact: table.qza (52.4 KB)
Filtered feature table visual: table.qzv (349.2 KB)
Composition feature table: composition.qza (56.3 KB)
Hierarchy: hierarchy.qza (51.0 KB)
Balances: balances.qza (111.5 KB)
Regression summary: regression_summary.qzv (355.2 KB)
Dendro-Heatmap: heatmap.qzv (122.9 KB)
Composition biom summarize-table output: comp.txt (616 Bytes)
Balances biom summarize-table output: bals.txt (765 Bytes)
Code:
# remove features present in only 1 sample
qiime feature-table filter-features \
--i-table table_unflt.qza \
--p-min-samples 2 \
--o-filtered-table table.qza
# remove features with frequency less than 10
qiime feature-table filter-features \
--i-table table.qza \
--p-min-frequency 10 \
--o-filtered-table table.qza
# add pseudocounts
qiime gneiss add-pseudocount \
--i-table table.qza \
--p-pseudocount 1 \
--o-composition-table composition.qza
# correlation-clustering
qiime gneiss correlation-clustering \
--i-table composition.qza \
--o-clustering hierarchy.qza
# ilr transform
qiime gneiss ilr-transform \
--i-table composition.qza \
--i-tree hierarchy.qza \
--o-balances balances.qza
# regression
qiime gneiss ols-regression \
--p-formula "Patient" \
--i-table balances.qza \
--i-tree hierarchy.qza \
--m-metadata-file metadata.tsv \
--o-visualization regression_summary.qzv
# dendrogram-heatmap visual
qiime gneiss dendrogram-heatmap \
--i-table composition.qza \
--i-tree hierarchy.qza \
--m-metadata-file metadata.tsv \
--m-metadata-category Patient \
--p-color-map seismic \
--o-visualization heatmap.qzv
I imagine at this point I am just overlooking something small and 'easy' since that seems to be the way it always goes. Any help is welcomed.
-Kristopher