I'm trying to understand if songbird is an appropriate tool to use to measure differential taxa abundance for my data. My experimental setup is that I did in vitro incubations of the same microbiome sample in triplicate with >100 different conditions, and so now, for each condition, I'd like to determine the taxa that changed abundance in comparison to a no treatment control. So while, in total, there are >300 samples, there are only 3 samples/treatment group, and my understanding is that this might not be appropriate for songbird for such a small sample size? As an aside, I've done beta-diversity analysis and see significant changes for some treatments.
If I might still be able to use songbird, do you have advice onto parameters to train the model? And if not, any general advice about other methods I could use?
Hi @Beth_C,
Welcome to the forum!
The question you are asking has less to do with q2-songbird as a tool rather than just the effect of sample size in statistics. If I understand your design correctly, you have 100 different "treatments" with an n=3 each? If so, that is an extremely small sample size and unfortunately -in my opinion- no statistical test out there would be appropriate for that. Even if some tests do give you a p value, such as the beta-diversity tests you've ran, these are by no means reliable because with a sample size of 3 you are not going to be able to infer a true population mean/variance. Think of it this way, if you randomly drew 3 students from a classroom of 200 students and you saw 2 male + 1 female students, with that knowledge would you really be able to say with any level of confidence that in that classroom there are twice as many male students than females? Probably not right?
If you are able to collapse some of these 100 groups into larger groups then you may be able to run some more reliable tests but as it is, I would recommend just looking at the data in an exploratory way without any tests.
This mostly applies to frequentist statistics -- you don't have enough samples to get good estimates.
But in all seriousness, do we really need good estimates to answer these types of biological questions? If you already have a good idea what the outcome should be, you arguably don't need as many samples to validate your hypothesis. In fact, many clinical studies are done with an N=1. It turns out there is still quite a bit you can do in these sorts of scenarios, provided you have a very strong prior.
Low sample scenarios are where Bayesian tools really shine; you can incorporate your anticipated outcome in the model and determine how much the data will affect your hypothesis. So rather than performing a null hypothesis test to see if you data is randomly distributed, you perform a model-based to test properties on your model (ie is there a difference between my groups) where your model is informed by your data + your prior.
Songbird cant do this since its Bayesian, but aldex2 with custom R code maybe able do this. Birdman can definitely do this and can provide more flexible priors. Both routes require a steep learning curve in Bayesian statistics, which is expected for these types of tricky experimental designs.
Thanks for your helpful replies! Yes, I see what you mean about 3 being small for independent samples. In my case, n=3 are technical replicates in an in vitro setting, and not independent samples, which might change how I can think about comparing them and is a bit different than how these tools are normally used. I think the goal I'm trying achieve is not a statistical test but to calculate log ratio, which I can use for exploratory analysis, as you suggest. So far I've been browsing through the data looking at relative abundances, but given the limitations of this, it seems that log ratios might be a better option to summarize the results? Of course there are lots of tools that could calculate this in different ways, and I was initially a to try attracted to Songbird because I found it user-friendly for browsing this rather large data set. I wonder if this sounds reasonable to you, and if you have any advice for me about what some good options for calculating log ratios might be?
Got it. Yes, log ratios can help circumvent the issues that you raised. It maybe worthwhile to look at Qurro which can interactively allow to explore different combinations of features and see how their log-ratios vary over samples. You would need to run either Songbird or q2-aldex2 first to compute the differentials though
Great, yes. I've been having difficulty fitting a model for Songbird. Here's the command I used and the output. It seems to improve when I reduce the number of samples in the 16S-table. Is is appropriate to split up the data into groups for analysis?
Hi @Beth_C sorry I just saw this. Yes that does look like there is some overfitting going on; if splitting the data into groups works for you, I'd say just run with it, so long as the cross-validation error is reasonable.