Hi @Sarah_McGrath! To answer your questions about what the different parameter choices for merge
and group
mean, I think some examples might help illustrate the concepts:
merge
from biom import Table
import numpy as np
from qiime2 import Artifact
from qiime2.plugins import feature_table
t1 = Artifact.import_data('FeatureTable[Frequency]',
Table(np.array([[0, 1, 3], [1, 1, 2]]),
['O1', 'O2'], ['S1', 'S2', 'S3']))
t2 = Artifact.import_data('FeatureTable[Frequency]',
Table(np.array([[0, 2, 6], [2, 2, 4]]),
['O1', 'O3'], ['S1', 'S5', 'S6']))
Those two tables look like this:
# Constructed from biom file
#OTU ID S1 S2 S3
O1 0.0 1.0 3.0
O2 1.0 1.0 2.0
# Constructed from biom file
#OTU ID S1 S5 S6
O1 0.0 2.0 6.0
O3 2.0 2.0 4.0
Above, we create two FeatureTable
s, note that each table has an S1
sample, and an O1
feature present.
feature_table.methods.merge([t1, t2], overlap_method='error_on_overlapping_sample')
...
ValueError: Same samples are present in some of the provided tables: S1
error_on_overlapping_sample
is complaining about the duplicate sample in both tables, S1
.
feature_table.methods.merge([t1, t2], overlap_method='error_on_overlapping_feature')
...
ValueError: Same features are present in some of the provided tables: O1
error_on_overlapping_feature
is complaining about the duplicate feature in both tables, O1
.
t3, = feature_table.methods.merge([t1, t2], overlap_method='sum')
print(t3.view(Table))
# Constructed from biom file
#OTU ID S1 S2 S3 S5 S6
O1 0.0 1.0 3.0 2.0 6.0
O2 1.0 1.0 2.0 0.0 0.0
O3 2.0 0.0 0.0 2.0 4.0
The merging doesn’t complain about the overlapping sample or feature from above, but rather sums the values anywhere that there is an overlap.
group
For grouping, taking the ceiling of a value means to round up. So, when you group on a metadata value, and select something like median or mean, you might wind up with a non-whole number, which doesn’t really make sense when considering the nature of an observation matrix. The ceiling means that after those values are computed (median; mean), we round the value up to the nearest whole number. We don’t need to worry about rounding when performing an operation like sum
, because that will always result in a whole number.
import qiime2
import pandas as pd
import biom
sample_mc = qiime2.CategoricalMetadataColumn(pd.Series(['x', 'y', 'y'], name='foo',
index=pd.Index(['a', 'b', 'c'], name='sampleid')))
table = qiime2.Artifact.import_data('FeatureTable[Frequency]',
biom.Table(np.array([[1, 2, 3], [30, 20, 10]]),
sample_ids=sample_mc.to_series().index,
observation_ids=['O1', 'O2']))
# Constructed from biom file
#OTU ID a b c
O1 1.0 2.0 3.0
O2 30.0 20.0 10.0
t_sum, = feature_table.methods.group(table=table, axis='sample', metadata=sample_mc, mode='sum')
print(t_sum.view(biom.Table))
# Constructed from biom file
#OTU ID x y
O1 1.0 5.0
O2 30.0 30.0
t_median, = feature_table.methods.group(table=table, axis='sample', metadata=sample_mc, mode='median-ceiling')
print(t_median.view(biom.Table))
# Constructed from biom file
#OTU ID x y
O1 1.0 3.0
O2 30.0 15.0
t_mean, = feature_table.methods.group(table=table, axis='sample', metadata=sample_mc, mode='mean-ceiling')
print(t_mean.view(biom.Table))
# Constructed from biom file
#OTU ID x y
O1 1.0 3.0
O2 30.0 15.0
It looks like you asked some additional questions while I was writing this post - can you take a look at this, and follow up with any remaining questions, restated? You can just copy-and-paste. Thanks!