'Convergence summary' in q2-mmvec

DanisaBescucci · May 31, 2022, 5:02pm

Hi @mortonjt
I am writing here because I am getting the same issue as hesterlee. Here is how my convergence summary looks like.

I have run the code exactly as this

qiime mmvec paired-omics \
        --i-microbes otus_nt.qza \
        --i-metabolites lcms_nt.qza \
        --p-summary-interval 1 \
        --output-dir model_summary

It is my first time trying mmvec and I only have 12 samples for the integration.
I did not specify a --training-column I ran the code as it is in github.
Are my results still reliable?

Thank you very much,
Looking forward to your response,

Danisa

mortonjt · May 31, 2022, 6:27pm

yea unfortunately its hard to do much with 12 samples. I'd definitely specify the training column.
It also looks like your model didn't reach convergence, so I'd run it longer.

DanisaBescucci · May 31, 2022, 7:15pm

Thank you for the fast reply.

How many of my samples would you recommend I should use for training the model? Also could you please tell me how I run it for longer?
What does it mean that the mode didn't reach convergence?

Thank you very much,

Danisa

DanisaBescucci · June 3, 2022, 6:16pm

Hi Jamie,

I am sorry to bother you, but I would really appreciate some guidance in how to run the model for longer.
I have used different settings for running the code, but I am not really sure if I am running it correctly, since it is still not reaching convergence. This is the code that I have ran

qiime mmvec paired-omics
--i-microbes otu-table.qza
--i-metabolites metabolite-table.qza
--p-num-testing-examples 12
--p-input-prior 0.001
--p-output-prior 0.001
--p-latent-dim 2
--p-summary-interval 1
--output-dir second-model-summary

Any advice would be much appreciate it,
Thank you very much,
Danisa

mortonjt · June 3, 2022, 6:39pm

Hi @DanisaBescucci , not a problem -- the keyword is --p-epochs. It specifies the number of iterations the model takes to run. Right now, the number of epochs is set to 10 by default. Judging from your model, it looks like you haven't put a dent in training, so you may want to bump this up to 50 or 100.

Increasing the --p-learning-rate may also help (its right now at 1e-3, you could try 1e-1).

There is an extended discussion on both of these parameters in the FAQs

DanisaBescucci · June 3, 2022, 7:08pm

Hi Jamie,

I tried this:

qiime mmvec paired-omics
 --i-microbes otu-table.qza 
--i-metabolites metabolite-table.qza
 --p-epochs 100
 --p-learning-rate 1e-1 
--p-latent-dim 2 
--p-summary-interval 1 
--output-dir second-model-summary

And I still can't reach convergence

This is what the graph looks like now

Any idea what it could be wrong?
Thank you,

Danisa

mortonjt · June 3, 2022, 7:28pm

Your plot actually looks great -- your loss has flattened, which is what convergence should look like.

But your cross-validation plot on the top is still empty -- I'm not sure why that is. Maybe you don't have enough samples? How many testing samples did you allocate? (If you are comfortable with showing your data, that may help).

Dealing with 12 samples is generally tricky -- we have not been able to get MMvec to work in that setting. It may be worthwhile to look for other datasets to see if you could merge your dataset with another dataset (if you can). If not, you can't fit 2 latent dimensions with 12 samples -- you'll be lucky if you can fit it with 1 latent dimension.

DanisaBescucci · June 3, 2022, 7:53pm

No problem ! Here is my data!

Metabolite_ID_Ave.txt (6.6 KB)
CMCP_OTU2.txt (23.3 KB)

I will try with 1 latent dimension. This information contains two different treatment groups and I could add 6 more samples that are from a control group. Would the model work the same way if I have three different treatment groups?

Please let me know,
Thank you very much!

Danisa

mortonjt · June 3, 2022, 9:32pm

yes, definitely add those 6 samples -- that'll make a difference.

I don't see the metadata file -- you need that to specify the train / test labels (which is super important for this small sample size).

DanisaBescucci · June 3, 2022, 10:28pm

Sorry. I forgot to shared that one. Here is my metadata file.
model-summary.qzv (36.1 KB)
I will add the extra samples that I have!
Thank you very much,

Danisa

DanisaBescucci · June 7, 2022, 5:39pm

Hi Jamie,

I am sorry to bother you again.

I have tried the model adding 12 more samples. That is a total of 24 samples now, and I can't get the first graph to show anything.
I have attached here my metadata file, the OTU table, metabolite table, and the qzv file with the summary.
The code that I have run is

qiime mmvec paired-omics \
--i-microbes otu-table.qza \
--i-metabolites metabolite-table.qza\
--m-metadata-file metadata.txt \
--p-training-column type \
--p-epochs 100 \
--p-learning-rate 1e-1 \
--p-latent-dim 1 \
--p-summary-interval 1 \
--output-dir fourth-model-summary

I am just wondering, if I don't see anything in the first graph, but I see convergence of the model in the second one, can I still use this data?

Is there anything that you see in my code or metadata file that I could modify and make it work?

Thank you very much again,

I really appreciate all the help,

Danisa

CMCP_OTU_all.txt (2.7 KB)
fourth-model-summary.qzv (38.4 KB)
Metabolite_ID_all.txt (12.9 KB)
metadata_fourth_trial_all.txt (372 Bytes)

mortonjt · June 9, 2022, 3:34pm

How long does it take to run? Does this run complete within 1 minute? Its possible that the run is too fast (aka, it completes within 1 second and doesn't record anything). You could increase --p-epochs 1000 to double check this.

I think it'll be very important to look at the cross-validation loss to make sure that that also converges; particularly given how small the sample size is.

DanisaBescucci · June 9, 2022, 4:22pm

Hi Jamie,

It always takes less than a minute to run even with --p-epochs 1000. However, I have run it with that setting and now the cross validation is there. I also changed the --p-latent dim to 0.

Does this look good to you? How would you interpret that the curves goes up at the end?

Sorry for all the questions,

Thank you very much for taking the time of troubleshooting this with me!

Danisa

fourth-model-summary2.qzv (88.7 KB)

mortonjt · June 9, 2022, 4:38pm

See the jump up at the very end -- that's a sign of overfitting. Your cross-validation score should be strictly increasing. So this summary is basically telling you that you still have too few species.

Also if you change --p-latent 0 that basically means you are only going to compute intercepts -- this is really only designed for hypothesis testing for the q2 score (also highlighted in the readme).

From what I'm seeing, there are too few samples in your dataset to draw meaningful conclusions with MMvec. Maybe with much bigger guns we can be able to answer this small sample size questions, but not at this exact moment.

DanisaBescucci · June 9, 2022, 4:50pm

That is too bad

Thank you for all the help and explanations!

Danisa

DanisaBescucci · June 13, 2022, 3:57pm

Hello Jamie,

I am still trying to run the model since I am doing this for an independent study. I have tried with a different data set. This time I have 42 samples.

When running the model at 50 epochs I get the following graph (canola_stats_50). However, if I increase the epoch number to 500 then I start seeing overfitting (canola_stats_500).

Thus, I was wondering how should I interpret the first results then? Could I use the conditional probabilities obtained when running the model at 50 epochs?

Thank you very much in advance,

Danisa
canola_stats_500.qzv (129.8 KB)

canola_stats_50.qzv (47.8 KB)

mortonjt · June 13, 2022, 4:12pm

It looks like the cross-validation metric at the first few time steps aren't recorded.

Given how low the cross-validation metric is (and how little variability there is), this study is probably fine. But I would recommend following up with Q2 score, so that you can have a hard number to show that your model is statistically significant.