strange mmvec emperor biplot

Hello guys,

When I used mmvec in the qiime2 environment to draw a biplot, I got a very wired result. The entire graph was perpendicular to the first coordinate axis and the output log file also does not contain any error information. I don't know what caused this. Below is the command line and runtime file I used, thanks for your help.

command line:
qiime tools import
--input-path 240328_microbiome.biom
--output-path 240328_new_microbe.qza
--type FeatureTable[Frequency]
qiime tools import
--input-path 240328_metabolome.biom
--output-path 240328_new_metabolite.qza
--type FeatureTable[Frequency]
qiime mmvec paired-omics
--i-microbes ~/Desktop/Ndatadraw_20220821/Mixomics240328_new_microbe.qza
--i-metabolites 240328_new_metabolite.qza
--m-metadata-file Mmetadata.tsv
--p-min-feature-count 0
--p-num-testing-examples 10
--p-learning-rate 1e-4
--p-batch-size 60
--p-latent-dim 3
--p-epochs 10000
--o-conditionals conditionals.qza
--o-conditional-biplot conditional_biplot.qza
--o-model-stats model_stats.qza \

These log conditional probabilities can also be viewed directly with qiime metadata # tabulate. This can be created as follows:

qiime metadata tabulate
--m-input-file conditionals.qza
--o-visualization conditionals-viz.qzv


qiime emperor biplot
--i-biplot conditional_biplot.qza
--m-sample-metadata-file 240322_metabolites-metadata.txt
--m-feature-metadata-file 240322_microbe-metadata.tsv
--o-visualization emperor4.qzv \


qiime mmvec heatmap
--i-ranks conditionals.qza
240328_metabolome.biom (41.5 KB)
240328_new_microbe.qza (14.4 KB)
240328_new_metabolite.qza (16.7 KB)
240328_microbiome.biom (39.6 KB)
model_stats.qza (417.8 KB)
model-summary.qzv (264.1 KB)

Mmetadata.tsv (526 Bytes)
conditionals-viz.qzv (1.2 MB)
conditionals.qza (22.8 KB)
emperor4.qzv (792.1 KB)
heatmap.qzv (222.7 KB)
metadata.tsv (13.0 KB)
conditional_biplot.qza (17.8 KB)
240322_metabolites-metadata.txt (1.1 KB)
240322_microbe-metadata.tsv (625 Bytes)

I appreciate any information or suggestions!


Hey @Chris123,

I agree that the plot looks super strange. I don't think it's incorrect however.

If you look at your conditional probabilities, we see that for any given metabolite, the conditional probability is essentially identical for each genus in your table.

It looks like there may be a single metabolite that isn't totally independent of the genus (hence the single dot to the right in your picture).

What's also interesting is your training loss looks like it did something. But there's no data for cross-validation, which kind of looks like an artifact of the model failing to describe anything, but I'm not familiar enough with the code to say for sure.

One thing I do see is that your latent_dim was set to 3, which means each metabolite is encoded as a vector of 3 weights, and those three weights are decoded into a vector of OTU probabilities. You might want to increase the size of that and see if you get useful divergence. I have no particular insight as to what a reasonable latent_dim would be. But it's the "shared language" between the encoder and decoder parts of this model, so you can think about it that way.

cc @mortonjt

Hi, it looks like you have 20 samples ... it definitely requires tinkering with MMvec ...
I'd try only 1 latent dimension and heavy regularization on the priors (i.e. 0.1)
I'd also look at the soils dataset in the tutorial : mmvec/examples/soils/check_soils.ipynb at master · biocore/mmvec · GitHub

And I would recommend cross-validation with 1-3 samples. May want to do multiple runs to confirm you are seeing real signals