Pooling sequences?

leahtee · February 1, 2018, 3:45pm

Hey all!

Not sure if this is a pipe dream, but here goes nothing:

I recently analysed sequence data from some fieldwork I had done, and found no differences in the microbiome between sampling sites. I now want to send additional samples, but I also want to save money. So I'm thinking of pooling samples from all three sites for two other seasons (i.e. a spring and a fall season) to see whether microbiome changes seasonally, since it does not change spatially.

My question is - can I pool my already sequenced samples so that I can compare them alongside the pooled samples (yet to be sequenced) from spring and fall?

Thanks so much!

Leah

Nicholas_Bokulich · February 1, 2018, 4:44pm

Hi @leahtee,

If I understand what you are trying to do, this should be very simple to achieve (but maybe I'm missing the point). You want to pool your existing data, correct (i.e., combine samples from sites 1, 2, and 3 into a single "sample" for comparison against pooled samples from other seasons)? Not pool and re-sequence the DNA? And you do not mean pool as in merge feature tables from two separate sequencing runs, correct?

If my interpretation is correct and you want to combine sequence data from different samples into a single sample, I give the steps below — but I am not sure that such pooling is even necessary/a good idea for analysis. I would personally keep all these samples separate, unless if you are trying to pool samples to increase their sequence coverage. That will increase your statistical power and give you a better sense of variability in your data.

If you want to pool samples in a feature table, you can use feature-table group to group samples on a specific metadata category. For example, imagine your metadata file looks something like this:

You can group your feature table samples with Season as the metadata-category to cluster them all into one sample, with a new sample ID: "Spring". Thus, you will need to make a new metadata file to describe the new sample(s), e.g.,

Grouping by site rather than season is probably a better idea in the example I've given above — having one "Spring" sample eliminates all replicates and you cannot perform effective statistical tests once you merge this table with a table containing samples from other seasons.

Whether or not you pool your sample data, batch effects will be a more significant issue if you are comparing data from two separate sequencing runs. If I were in your shoes, I would resequence a subset of the samples that you have already sequenced onto the next sequencing run. That way you can either just compare samples that are on that run, sidestepping issues of batch effects entirely, or else test to determine that samples sequenced on separate runs look similar and batch effects don't appear to be an issue (they are not always an issue, but nothing is more disappointing than doing a new sequencing run and finding that you've wasted money because the runs cannot be compared due to batch effects. Better safe than sorry).

It might still differ spatially over time. E.g., maybe spatial differences only become apparent during a particular season.

Bottom line, though: pooling DNA should be fine, so long as you retain large enough of a sample size that you can perform adequate statistical testing.

I hope that helps!

leahtee · February 1, 2018, 6:42pm

Yes, exactly! Thank you so much for your input. I'll run it past my PI and get her take on it - and of course keep you posted!

system · March 5, 2018, 12:43am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.