I was wondering if any of you statistics-savvy people have experience with analysing 16S amplicon sequencing data together with metabolite data, particularly taking into account the compositionality of (16S) amplicon sequencing data.
As far as I understand it, the fact that amplicon sequencing data are compositional prevents a simple Pearson or Spearman correlation, even though metabolite data expressed as concentrations are not compositional. Is this issue properly overcome if the amplicon sequencing data are scaled using qPCR quantification of total bacteria with universal primers, flow cytometry, …? Are there other statistical methods that allow us to calculate some correlation between amplicon sequencing (compositional) and metabolite (noncompositional) data? Is this a solved problem? Is this solvable? And what about the cases where some samples have 0 reads for a certain ASV/genus – are they treated as 0 or as NA for the purpose of statistics calculations? After all, one could always argue we just haven’t sequenced deep enough in those samples …
Any insight into this corner of statistics would be highly appreciated!