Linear mixed effect model - missing values in random variables

Rui · October 23, 2020, 3:02am

Dear qiime2 developers,

When I run a linear mixed effect model with q2-longitudinal, I noticed that the command would fail if the -p-random-effects contains variables with missing values in the metadata file. For example when I run:

qiime longitudinal linear-mixed-effects
--m-metadata-file sample-metadata-modified.tsv
--m-metadata-file 2018-lme-filtered-core-metrics-results/shannon_vector.qza
--p-metric shannon_entropy
--p-random-effects Week,pH,conductivity,WaterTemp,WaterDepth,ammonia,nitrate
--p-group-columns TreatP
--p-state-column Week
--p-individual-id-column Site
--o-visualization 2018-lme-filtered-core-metrics-results/2018-lme-shannon.qzv

this would fail because variables "pH", "ammonia", "nitrate" contain missing values in the metadata for some samples.

when I got rid of these variables and run it again:

qiime longitudinal linear-mixed-effects
--m-metadata-file sample-metadata-modified.tsv
--m-metadata-file 2018-lme-filtered-core-metrics-results/shannon_vector.qza
--p-metric shannon_entropy
--p-random-effects Week,conductivity,WaterTemp,WaterDepth
--p-group-columns TreatP
--p-state-column Week
--p-individual-id-column Site
--o-visualization 2018-lme-filtered-core-metrics-results/2018-lme-shannon.qzv

The command worked perfectly, and I was able to obtain the visualization.

Is there any way to ask qiime to consider those variables with missing values, or should I just simply delete the rows containing missing values? Much appreciation!

Rui

q2 version: 2020.8.0
installation: docker image on an HPC

timanix · October 23, 2020, 4:34am

Hi!
I am not an expert here, but if you don't loose a lot of data by removing empty samples for desirable variables, IMHO, it is better to do it. Otherwise, you are introducing a lot of biases in your model.
Will wait here for expert opinions of other participants.

jwdebelius · October 23, 2020, 8:04am

Hi @Rui,

I'm also not a biostatistican (although somedays I play on ). I think @timanix's advice is spot-on. It wouldbe my approach. Imputation can be hard in an already noisy system and I'd rather have confidence in at least one side of my model (my metadata). So, I would drop the samples misssing covariates too

Best,
Justine

Rui · October 23, 2020, 9:30am

Thanks for your opinions! I'll give it a try and see the differences.

system · November 23, 2020, 3:30pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.