Yikes! Another post was brought up here: Gneiss Singular matrix error - #5 by ajaybabu27
This is most surely pointing to some sort of ill-defined input. Does this throw an error on just one of the inputs? And have you tried running OLS on it as a sanity check. I know it is not "technically" correct, but if it also fails on OLS, then that is a sign for a more sinister problem.
Nice article! Right, an interpretable goodness-of-fit is not too straightforward. Another nice blog post breaks this problem down here Gneiss does provide the residuals and fits, so it should be possible to rig your own goodness-of-fit. But this is definitely something that we really should have in Gneiss in the future.
Right, this is tricky and unfortunately there isn't really a right answer that I'm aware of at the moment. If I had to guess where the source of NaNs is coming from, it is likely because there aren't enough samples within the subgroups -- you need at least 3 samples for each cross-section. That means for a given milk type, and a given section for a given individual, you need at least 3 values for your particular balance have non-zero variance. If you don't have many samples, this actually can be quite a stringent criteria.
The rule of thumb I use when it comes down to filtering species is measuring your degrees of freedom. If you have 2 categorical variables you have 2 degrees of freedom. If you have a bunch of microbes that are only observed in 2 samples, you can have a near perfect fit for those microbes, so whatever inference you perform on them is not useful (because you model will not have the resolution to measure them). At a bare minimum, I would recommend counting all of the variables in your formula, and using that as the baseline for filtering. Although be careful, categorical variables with D categories actually have D-1 degrees of freedom and therefore count as D-1 variables.