Help comparing differences between time points

bpscherer · April 30, 2020, 9:13pm

Hi all,

I have 35, 16S fecal samples from 15 individuals. These samples are taken at three time points (Baseline, 1 month and 3 month), but I have several missing samples.

I've been exploring the data, and am trying to figure out the best way to look at overall changes in taxonomy or abundance from the baseline samples to the 1 or 3 month samples.

I showed some early results to my collaborator today, and she was wondering if there was a way to condense the taxa bar plots by metadata column (in this case time) rather than having columns based on individual samples.

I'm sorry this is kind of open-ended. Help wither with the taxa bar plots or with general ideas for analysis would be most welcome, thanks in advance!

Nicholas_Bokulich · April 30, 2020, 9:47pm

Hi @bpscherer,

Sure! See qiime feature-table group, this will allow you to group by sample metadata. Then build a barplot from the grouped feature table.

Good luck!

bpscherer · May 1, 2020, 2:26pm

Awesome, thanks Nicholas!

I got it to work out, and had a further question or two.

How do you determine which "mode" is most useful? I made bar plots out of each mode (sum, mean-ceiling, and median-ceiling). They are mostly pretty similar, except for the median plot which shows Clostridales as making up a substantially larger portion of each group.

time-median-bar-plots.qzv (351.0 KB)
time-sum-bar-plots.qzv (356.4 KB)
time-mean-bar-plots.qzv (354.6 KB)

Are there any statistical tests you would recommend for exploring if these differences between time points are significant? I know there are some issues with these types of comparisons due to the compositional nature of the data.

Thanks again for your help, any other insights would be great!

Nicholas_Bokulich · May 1, 2020, 2:54pm

If you are grouping many samples then median-ceiling is probably useful: it reduces the impact of outliers, and (unlike sum) is not driven by samples with higher sequence counts. There have been some other topics on this on the forum, I suggest looking around for those for some more discussion on these differences.

For compositional data like this, use ANCOM (in q2-composition) or q2-aldex2. Since you have only 3 timepoints, comparing time 1 vs. 2, 1 vs. 3, and 2 vs. 3 (or some subset of these if you are not interested in all) is probably the most easily interpretable if you are interested in individual time increments.

Good luck!

system · June 1, 2020, 8:54pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.