Hi, I am dealing with a specific type of time-series microbiome data, and I am facing some challenges in analysing the possible differential abundances of specific taxas in the microbiome samples.
The dataset that I have at the moment is related to salt stress treatment in the gut. We have a time series data of salt treatment from 0 (no salt treatment), 2, 4, 8, 12, and 24hr. My main purpose is to investigate whether there is any microbiome (or specific taxa) that showed differential abundance across the time.
My current question is: I do not have the pairwise data for this treatment (Example: 0 to 24 hrs without salt treatment), and I am unable to obtain this set of data due to gut sample collection difficulties. Right now I am stucked that which type of methods that is suitable for me to perform the differential abundance analyses with statistical test.
I have tried with ANCOM with the meta-data 'time', but it showed invalid value due to the numerical data. In this situation, I think I could perform pairwise time point only (0 vs 2, 0 vs 4 etc) with ANCOM only? Are there any alternative ways in testing the differential abundances with ANCOM across the time?
Hope everyone could suggest some methods for me to tackle this issue. I am still new, and exploring the QIIME2 method on microbiome analysis. Thank you
Welcome to the QIIME 2 Forum!
I would recommend reviewing our longitudinal analysis tutorial. One feature in particular that you might be interested in is the feature volatility plot.
ANCOM used as you're describing won't treat samples as paired (e.g., it'll compare all time 0 samples to all time 4 samples, rather than say looking for a consistent change across individuals between those time points) so as a result will be underpowered for a longitudinal analysis. The tools you'll find in the above tutorial should work better.
Thank you for you suggestion. I am learning how to use the feature volatility plot at the moment.
I have one question regarding the accuracy_result.qzv output.
In the tutorial, it is briefly explain about the accuracy_result.qzv, and refer to the sample classifier tutorial for regression model explanation and how the scatterplot (predicted vs true value) was generated.
I tested with 15 timepoint samples (0,1, 2, 3, 4, 6, 8, 12, 24, 48, 72, 96, 120, 144, 192 hours) to test, and this is the scatterplot that i obtained:
If i am not mistaken, the x-axis should be represent the my metafile timepoint data, while the y-axis is represent the predicted time value based on the trained model.
My main question is: Why does it show only 6 dotpoints of predicted value vs true value? I thought the accuracy model should show me 15 dotpoints of predicted value vs true value as I have 15 time points?
Hope you can answer me question. Thank you for your help!
You only have 6 datapoints because the model has to use some of your data to train on.
It splits your data into a training set and a testing set. The training set is used to train the model, but then the model can't test those values because it was already trained on those data points and testing those would result in your model being over-fit to your data. So after the model is trained you can use the model on the "held back" testing data points, which is only a subset of the data.
I would suggest reading up on the sample classifier tutorial for more info!
Hope that helps!