I am analyzing cloacal samples from two bird species and I am interested in assessing whether features differ in abundance based on two continuous variables. I have a total of 73 cloacal samples but only 16 of my samples have measurements of the continuous variables. These measurements are from tracking data on these 16 birds. I am curious as to how Songbird works with a variable that is only found in a subset of the sequenced samples and if this a proper use of the tool.
After running the model below and comparing to the null I get a Q2 score of ~0.424 and it looks like my model is not overfit based on the figures below. Sidenote: I do not know why the null ran 1 million iterations and the model ran 200,000. Both were set to run 100,000.
qiime songbird multinomial
I then used qurro to examine differential features and built a plot with features found across one of my continuous variables. In the plot you can see below, only samples that had a measurement for the continuous variable were plotted, which makes sense.
However when I exported the data used to make the plot I noticed that a log ratio was also recorded for samples that did not have any continuous data measurements. How should I interpret this? Are my 16/73 samples with continuous data enough to identify differentially abundant features across all samples or am I reaching here? Is it safe to conclude that my samples without continuous data actually have the relationship between features indicated by the log ratio?
Thank you for your time and assistance and to the researchers for building these amazing tools!