Correlations and the compositionality issue

mverce · April 19, 2021, 3:14pm

Hi everyone,

I was wondering if any of you statistics-savvy people have experience with analysing 16S amplicon sequencing data together with metabolite data, particularly taking into account the compositionality of (16S) amplicon sequencing data.

As far as I understand it, the fact that amplicon sequencing data are compositional prevents a simple Pearson or Spearman correlation, even though metabolite data expressed as concentrations are not compositional. Is this issue properly overcome if the amplicon sequencing data are scaled using qPCR quantification of total bacteria with universal primers, flow cytometry, ...? Are there other statistical methods that allow us to calculate some correlation between amplicon sequencing (compositional) and metabolite (noncompositional) data? Is this a solved problem? Is this solvable? And what about the cases where some samples have 0 reads for a certain ASV/genus – are they treated as 0 or as NA for the purpose of statistics calculations? After all, one could always argue we just haven’t sequenced deep enough in those samples ...

Any insight into this corner of statistics would be highly appreciated!

Best regards,
Marko

gibsramen · April 19, 2021, 3:40pm

Hi, @mverce

Great question - this is an active area of research so we're still trying to figure out how to best combine 'omics data types. One avenue you may look at is mmvec which was designed for determining microbe-metabolite interactions. mmvec has a QIIME 2 plugin as well so you can find some topics and ask for help on this forum about it

mverce · April 20, 2021, 4:09pm

Thank you for the tip @gibsramen! I'll have a look at the mmvec plugin and see how I could apply it to our data

All the best,
Marko