Q2 Semantic types

Hello everyone,

I was wondering if a detailed description of qiime2 feature-table semantic types exists? I found this repo which lacks any explanation

More specifically I would like to read about the difference between
FeatureTable[Composition] and FeatureTable[Frequency]

I assume that the first one represents the relative abundances when the second one represents actual counts. So, if we fit raw microbiome data it is FeatureTable[Composition] due to the compositional nature of microbiome data itself, and if we apply any sort of transformation, e.g., CLR then it becomes FeatureTable[Frequency]. That's how I understand it, but would like to verify:)

Thank you!

1 Like

The developer documentation has some pointers...

1 Like

Hi @Oleg ,

You have the right idea but the definitions are switched. FeatureTable[Frequency] consists of (usually raw) counts. FeatureTable[Composition] is rather vaguely named — but this was rather written to store compositionally aware (e.g., CLR transformed) counts.

See for example this function in the gemelli plugin, which accepts a FeatureTable[Frequency] table and outputs a CLR-transformed FeatureTable[Composition]:

A (partial) list of semantic type definitions exists here, but clearly could use a bit better description for the FeatureTable[Composition] type:


Dear @Nicholas_Bokulich,

May I ask you to elaborate on FeatureTable[Composition] again, please?

I'm looking at the following description of this type where it says:
``FeatureTable[Composition] : A feature table (e.g., samples by OTUs) where each value indicates the frequency of an OTU in the corresponding sample, and all frequencies **are greater than zero**.

Imagine we have a count table as follows:

pd.DataFrame([[1, 1, 7, 0],
                      [2, 0, 2, 0],
                      [5, 5, 0, 3],
                      [0, 2, 8, 1]], 
                     index=['otu1', 'otu2', 'otu3', 'otu4'],
                     columns=['s1', 's2', 's3', 's4'])

then CLR result indeed contains negative values because of the geometric mean in the logarithm (example):

                   s1               s2               s3              s4
|otu1|  -0.130469| -0.147152|   0.082192|  0.021289|
|otu2|  -0.050426|  0.140530|  -0.122602|  0.090282|
|otu3|   0.157213|  0.089236|  -0.078150|  0.021289|
|otu4|   0.023682| -0.082614|   0.118560| -0.132861|

I ran into this during beta-diversity analysis with qiime2 since standard tests work with FeatureTable[Frequency] or FeatureTable[RelativeFrequency]. However, the last one does not represent compositional data, but FeatureTable[Composition] does as I learned at the beginning of this discussion.

How do we account for compositionality in diversity analysis then?
Thank you!

Hi @Oleg ,

Good question... most of the beta diversity analyses and metrics exposed in q2-diversity are not compositionally aware, i.e., they operate on relative frequency or some normalized frequencies but do not perform any compositional transformation (except when using metric=aitchison).

Hence the motivation for plugins like q2-gemelli (shown above) and deicode.

Technically it would be easy to modify q2-diversity's beta action to also accept FeatureTable[Composition] as an input for some appropriate metrics, but which metrics could accept transformed data would need to be closely checked.

I did not create this type so cannot explain — but perhaps @cmartino and @mortonjt have come idea why negative values are not allowed in FeatureTable[Composition]?

This was a legacy type -- FeatureTable[Composition] was originally created to represent tables after imputation for compositional methods (i.e. strictly non-negative valued tables)

After we realized that imputation is method specific, we've stopped using this type. TBH, I'm ok with axing it.

1 Like

thank you, @mortonjt!

may I ask you what is imputation for compositional methods?
are you talking about, e.g., adding(imputation) pseudo counts before CLR transformation?
sorry for being persistent :sweat_smile:, but currently I use this feature table type in my plugin to store feature tables after being transformed from 'compositional' (raw counts) to 'non-compositional' (e.g., after CLR), and the problem I face now is that many established q2 methods do not work with FeatureTable[Composition], so which type should I use then to store clr-transformed data?

Based on the description of other FeatureTables I cound not find a proper type:

FeatureTable[Frequency]: A feature table (e.g., samples by OTUs) where each value indicates the frequency of an OTU in the corresponding sample expressed as raw counts.(nope)

FeatureTable[RelativeFrequency]: A feature table (e.g., samples by OTUs) where each value indicates the relative abundance of an OTU in the corresponding sample such that the values for each sample will sum to 1.0. (nope)

FeatureTable[PresenceAbsence]: a feature table (e.g., samples by OTUs) where each value indicates whether an OTU is present or absent (nope) in the corresponding sample.

Yes, after adding pseudocounts. There was some ambitions to add other methods from zcompositions at one point, but we’ve found alternative approaches that don’t need imputation at all (like RPCA, CTF, Birdman, MMvec,…)

What are you trying to do with clr transformed values?
If I’m using clr, it’s often a customized workflow (outside of q2).

What are you trying to do with clr transformed values?

I would like to perform alpha-/ beta-diversity and DA tests available in qiime2, e.g. from this post

Why not just run a tool that performs CLR internally? Like aldex2, Songbird or Birdman for DA, or Aitchison distance in q2-diversity, RPCA / CTF in gemelli?

I have not heard of anyone using CLR for alpha diversity -- that would most certainly be a novel method that would not fit within the scope of qiime2.

thank you for the links!
tbh, I was not aware of these plugins and I will check them out.
I don't use CLR in alpha-diversity, mentioned it wrongly, so no novel method from my side, sorry:/

1 Like