Hello,
I am working with microbiome 16S data from fecal samples collected from 6 subjects at two time points (T0 and T1). I would like to perform a differential abundance analysis to identify potential changes between T1 and T0 while accounting for the effect of each individual subject. Here, you can see an Aitchison Beta Diversity plot, where clustering appears to be more related to the subject than to the time points.
I am using ANCOM-BC2 to perform this comparison by fitting a mixed-effect model (Is there a way to test Paired-samples in ANCOMBC2? · Issue #183 · FrederickHuangLin/ANCOMBC · GitHub):
ancombc2(data = ps_paired,
tax_level = NULL,
fix_formula = "timepoint",
rand_formula = "(1 | subjectID)",
p_adj_method = "BH",
prv_cut = 0.1
...)
First, I got the next error message:
The error message from `lmerTest` is as follows:
number of levels of each grouping factor must be < number of observations (problems: paciente)
However, I reproduced the analysis for all the ASVs outside of ANCOM-BC2 using the lmerTest package, and I didn't encounter any errors. An example for ASV_1 is as follows:
lmerTest::lmer(ASV_1 ~ timepoint + (1 | subjectID), data = data, control = lme4::lmerControl())
This error was solved by increasing the prevalence filtering parameter prv_cut to 0.2. After this change, I received the following warnings, which are related to model misspecification and led me to suspect that the results might not be reliable.
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Hessian is numerically singular: parameters are not uniquely determined
2: In as_lmerModLT(model, devfun) :
Model may not have converged with 1 eigenvalue close to zero: 6.8e-11
Globally, 350 out of the initial 866 ASVs were included in the analysis after detecting structural zeros. As you can see in the image below, 106 out of 350 ASVs had no estimation for lfc, se, and W, and their p-values and adjusted p-values were equal to 1. After examining these ASVs with no parameter estimation in the feature table, I observed that they all shared the common characteristic of not being present in both time points for at least one subject. However, their p-values were still used to calculate the adjusted p-values, which resulted in all adjusted p-values from the time variable being equal to 1.
When I increased the prv_cut to 0.3, only 81 out of the initial 866 ASVs were included after structural zero detection, and only 5 out of those 81 had no estimations for the previous parameter. I did not receive any errors or warnings in this case.
What is the most suitable way to perform this comparison?
1. I could use the results obtained after applying prv_cut = 0.3. However, I am concerned that I might be missing rare but biologically important signals, as many ASVs were lost after using this filter.
2. I have read that I could use a fixed effects model (formula: subjectID + timepoint) instead of a mixed effects model. I believe this is the typical design for paired analysis in other omics tools like DESeq2. However, as far as I understand, the model would be quite complex and might not be generalizable beyond the results we are observing.
3. Perhaps this tool is not yet the most suitable for this type of analysis, and I should explore other alternatives like ALDEx2, which performs Wilcoxon test after CLR transformation.
I would be grateful if you could share your opinion and recommend more suitable alternatives, if you are aware of any.
Thanks,
Andrés