I am trying to use the longitudinal plugin and I am getting as I have seen a common error, however, I cannot understand why I am getting it.
Plugin error from longitudinal:
Linear model will not compute due to singular matrix error. This may occur if input variables correlate closely or exhibit zero variance. Please check your input variables. Removing potential covariates may resolve this issue.
If I run the command without the Group variable it works perfectly, and if I run it without the BCS and sex variables it also works. However, when putting together the three variables I get the error. I attach here the metadata I am using for this analysis. The variables are not confounded among "group", "BCS" and "sex", not quite sure what I am missing.
Hi @AnnaC,
I think the issue is that you are inputting too many variables/have no replicates with those variables. It looks like you have one time point per subject per Group/BCS/sex category. So even though the variables are not cofounders, you just don’t have the biological replication needed to test all of those variables together.
The interaction terms could be part of this problem though — as a last-ditch attempt, you could try using this line:
--p-formula 'Group+BCS+sex'
Instead of this line:
--p-group-columns Group,BCS,sex
(when inputting a list of group columns to set independent factors, all interactions are tested, it is the equivalent of Group*BCS*sex — and I am merely speculating that removing these interaction terms may solve the replication issue)
Could you please further explain that “It looks like you have one time point per subject per Group/BCS/sex category.” Several subjects belong to the same group and have the same sex and BCS, I think I am missing something. Thanks again!
Sorry, I misspoke. I did not mean that all timepoints/groups/BCS/sex categories have only one subject, but it looks like there may be some that have only one or two subjects (not enough replication). This is the number of subjects per grouping when grouping by all fixed effects, so the number of replicates you would have when testing the interaction between all effects (which is why using an additive formula without interaction works) — here's some python code to demonstrate:
>>> import qiime2 as q2, pandas as pd
>>> md = q2.Metadata.load('sample-metadata_cats_all_wo_blanks_def_AC.tsv').to_dataframe()
>>> md.groupby(['Group','BCS','sex','Time']).count()['Subject']
Group BCS sex Time
G1 LEAN F 0.0 2
2.0 2
4.0 2
M 0.0 3
2.0 4
4.0 4
OW F 0.0 6
2.0 6
4.0 6
M 0.0 1
4.0 1
G2 LEAN F 0.0 2
2.0 2
4.0 2
M 0.0 2
2.0 3
4.0 2
OW F 0.0 4
2.0 4
4.0 4
M 0.0 4
2.0 4
4.0 4
The numbers in there indicate the number of subjects per timepoint X group X sex X BCS