I have 35, 16S fecal samples from 15 individuals. These samples are taken at three time points (Baseline, 1 month and 3 month), but I have several missing samples.
I’ve been exploring the data, and am trying to figure out the best way to look at overall changes in taxonomy or abundance from the baseline samples to the 1 or 3 month samples.
I showed some early results to my collaborator today, and she was wondering if there was a way to condense the taxa bar plots by metadata column (in this case time) rather than having columns based on individual samples.
I’m sorry this is kind of open-ended. Help wither with the taxa bar plots or with general ideas for analysis would be most welcome, thanks in advance!
I got it to work out, and had a further question or two.
How do you determine which "mode" is most useful? I made bar plots out of each mode (sum, mean-ceiling, and median-ceiling). They are mostly pretty similar, except for the median plot which shows Clostridales as making up a substantially larger portion of each group.
Are there any statistical tests you would recommend for exploring if these differences between time points are significant? I know there are some issues with these types of comparisons due to the compositional nature of the data.
Thanks again for your help, any other insights would be great!
If you are grouping many samples then median-ceiling is probably useful: it reduces the impact of outliers, and (unlike sum) is not driven by samples with higher sequence counts. There have been some other topics on this on the forum, I suggest looking around for those for some more discussion on these differences.
For compositional data like this, use ANCOM (in q2-composition) or q2-aldex2. Since you have only 3 timepoints, comparing time 1 vs. 2, 1 vs. 3, and 2 vs. 3 (or some subset of these if you are not interested in all) is probably the most easily interpretable if you are interested in individual time increments.