Order of steps for analyzing time-point experiment

Kara · February 5, 2019, 8:47pm

Our experiment is looking at differences between the microbiomes of soil treated with two separate compounds at 0, 2, 6, and 12 weeks post-treatment. Each timepoint was processed and sequenced at a different time (not ideal, I know, but we did what we could!). We have worked with the timepoints, separately, up until now. For each timepoint we have run alpha/beta diversity, made phylogenetic trees, and assigned taxonomy. We then filtered each taxa table to remove "unassigned" as well as mitochondria/chloroplasts taxa using the "qiime taxa filter-table" command, which creates a new filtered-feature table. Again, we have done this for each time point.

Now, we want to do some over-time analyses, and there's not a lot of information we could find about the correct order of steps to take. We think we have a tentative pipeline (below), but we were hoping for some confirmation/advice on what's the best order!

We thought we'd first merge our filtered-feature tables (at each timepoint) using the feature table plugin method "qiime feature-table merge" and then do the same with the rep-seqs using "qiime feature-table merge-seqs." We'd also create a new metadata file by copy/pasting all of the information from each individual timepoint metadata file into one gigantic sheet and validate with Keemei. Then we'd have a combined filtered-feature table, combined rep-seqs, and new metadata file. From there, we could make a new rooted phylogenetic tree so that we could do new alpha/beta analysis to look at overall treatment trends. Lastly, we'd do ANCOM to identify the features that are most variable and then look at those features with q2-longitudinal to see if there are treatment and feature differences over time.

Since we'll export the relative abundance data to a BIOM table, we can do all of that analysis later with Excel, combining whatever time points we need for analysis.

Does that order of steps sound about right? We'd appreciate any advice or suggestions!

timanix · February 6, 2019, 6:30am

Hi!
I have the same situation right now. Before it, with my preliminary data, I combined it with the data of another researcher from the same project as you described and it worked fine. Now I have new two datasets and i am going just process all libraries (old ones and new from one experiment) together from the very beginig.

jwdebelius · February 6, 2019, 9:18am

Hi @Kara and @timanix,

That sounds like a cool experiment!

Your pipeline makes sense to me. (Although I will also offer the suggestion that I find it easiest to build my combined table first, if I can, and then filter down. (Since beta diversity is computationally expensive to calculate, but cheap to filter.) It's minor, but might be something to consider for the future. I think if you've got already demultiplexed data, you could import all the sequencing runs into the same file and then process for there.

Im curious how youre going to use ANCOM to look at the temporal effects. Group by timepoint, and then look for one that's different?

You might also like the multi-variate modelling in gneiss where you could correlate your features against time, or look for internal clustering. The benefit of gneiss is the multivariate capacity.

Best,
Justine

system · March 9, 2019, 3:18pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.