Is songbird appropriate for relative frequency (as opposed to frequency) data?

I have species-level taxonomy assignments from the metaphlan/humann2 software, which are represented in QIIME2 as a FeatureTable[RelativeFrequency] artifact. Here, the values are fractions, and they sum to one for each sample.

Given that my data are not the sort of relative abundances discussed in the songbird paper (or anticipated by the q2-songbird plugin, which takes a FeatureTable[Frequency] as its input), is it appropriate or meaningful to use the songbird software on these data? If NOT, is it possible to transform them into some other form that could participate in the sort of differential ranking approach songbird employs to identify potentially interesting taxa? Or are there other methods more appropriate in this sort of data to identifying taxa that differ considerably between different conditions?

Any guidance would be greatly appreciated!

1 Like

Hi @Amanda_Birmingham - right, Songbird is designed to handle counts instead of fractions.
But we did use Songbird to process metaphlan results in the MMvec paper – its a bit of a hack, but it gave ballpark results.

Yes there are other approaches that can handle fractions (most of the compositional methods from geology do so). Examples include using the alr, clr or ilr transform (all of which are available in skbio). There are additional challenges of handling zeros - zeros are easier to handle in counts than in fractions; but there are a number of imputation approaches that may be handy here (see multiplicative_replacement or zcompositions). I don’t know of CLI tools that can do this, so some scripting may be necessary.

1 Like