Correlations and the compositionality issue

Hi everyone,

I was wondering if any of you statistics-savvy people have experience with analysing 16S amplicon sequencing data together with metabolite data, particularly taking into account the compositionality of (16S) amplicon sequencing data.

As far as I understand it, the fact that amplicon sequencing data are compositional prevents a simple Pearson or Spearman correlation, even though metabolite data expressed as concentrations are not compositional. Is this issue properly overcome if the amplicon sequencing data are scaled using qPCR quantification of total bacteria with universal primers, flow cytometry, …? Are there other statistical methods that allow us to calculate some correlation between amplicon sequencing (compositional) and metabolite (noncompositional) data? Is this a solved problem? Is this solvable? And what about the cases where some samples have 0 reads for a certain ASV/genus – are they treated as 0 or as NA for the purpose of statistics calculations? After all, one could always argue we just haven’t sequenced deep enough in those samples …

Any insight into this corner of statistics would be highly appreciated!

Best regards,


Hi, @mverce

Great question - this is an active area of research so we’re still trying to figure out how to best combine 'omics data types. One avenue you may look at is mmvec which was designed for determining microbe-metabolite interactions. mmvec has a QIIME 2 plugin as well so you can find some topics and ask for help on this forum about it :slight_smile:


Thank you for the tip @gibsramen! I’ll have a look at the mmvec plugin and see how I could apply it to our data :slight_smile:

All the best,