Referencing the original post for context
Its hard for me to say what is happening either -- I have not seen this error before. Its especially weird to also have the baseline model flatline, which hints at something wrong with the biom table (or perhaps your train/test splits in your metadata).
What does the output differentials look like? Is it all nan? Did you try lowering the learning rate? How many samples do you have in your study (and how many are in train/test)?