Hi everyone,
Let's talk about +
!
For those of you not regularly knee deep in guano, know that bat poop looks basically just like little mouse or rat poops. I mention this because we generated sequence data from guano in one of two ways:
- A single pellet was collected and DNA was extracted from that single pellet.
- A bunch of pellets, maybe 4-5 per "pool", were smushed together (technical term), and DNA was extracted from that pool.
In both cases, we amplified a marker gene, COI, from every sample and now I'm left with pile of sequence data from these samples. We use this COI marker gene to investigate what the bats are eating, as these primers are really targeting arthropod COI primarily.
The design of this experiment is pretty simple. Samples were collected:
- from two separate locations (Egner and Hickory)
- monthly, over three different collection months (June, July, September)
With that in mind, I'm curious about how to proceed with a basic alpha diversity analysis. It seems pretty clear to me from the picture below that the observed richness, as well as Shannon's or Simpson's diversity values differ when I partition the pooled from single sample types. Note that the x axis represents the two different sites, and each point represents a samples alpha diversity value:
What's curious to me is the ANOVAs. Let's say I try to model the data where all of it is grouped together regardless of whether the sample came from a single or pooled form (what I'm calling a BatchType in the model below):
alpha_diversity ~ Site * Month * BatchType
In this case, I observe significant main effects for Site, Month, and Batch Type. I also see a significant interaction between Month:BatchType. The plot above would suggest to me that this is due to those pooled samples from HB
(i.e. the Hickory site).
But grouping these data and controlling for the effect of BatchType isn't something I really want to bother with. Would it be smarter to treat these data as two completely independent studies? In other words, would it be more appropriate to model them separately like:
analysis of single samples:
alpha_diversity_single ~ Site * Month
analysis of pooled samples:
alpha_diversity_pooled ~ Site * Month
I've seen posts before mentioning the need to control for effects like sequencing platform, or sequencing runs, and I understand the merit of trying to control for that. In my case, I expect, and observe, a higher amount of diversity for rarefied data from my pooled samples.
Indeed, when I split up the data and analyze the single and pooled samples separately, I see different significant main effects from the initial model where I include BatchType, as well as between the two models where these data are split:
alpha_diversity_single ~ Site * Month
- significant effect for Month and Site for Observed OTUs (richness), but...
- NO significant effects for Month or Site for Shannons/Simpsons
alpha_diversity_pooled ~ Site * Month
- significant effect for Site but NOT month for Observed OTUs (richness), yet ...
- significant effects for Month or Site for Shannons/Simpsons
(compared with the original combined):
alpha_diversity ~ Site * Month * BatchType
- significant effect for Month and BatchType for Observed OTUs
- significant effect for Site and Month and BatchType for Shannon's and Simpson's
Looking at these data by splitting samples into their different BatchType groups, or keeping them in a big group and trying to control for it as a main effect paints two pictures in my mind, but both appear to point to the pooled samples as driving the differences in alpha diversity.
I'd love to hear others thoughts on how they might tackle this problem. Thanks for your help with all this crap