Hi everyone, i have finished processing bacterial data through QIIME2. I do not lose to many reads after filtering but i am finding my observed alpha diversity numbers seem low. Rather than a few thousand species i only have a few hundred per treatment. Even though my OTU table says i have over 5000 OTUs.
First, do you have 5000 OTUs total? Over 5000 counts? Over 5000 counts per sample? (Did you rarefy to 5000 sequences/sample, maybe?)
If it's the first, remember that many samples don't share all their features. It obviously depends on the environment, so the fact that you have over 5000 independent features doesn't necessarily correspond to 5000 features in each sample.
Second, does the number of features make senes for your environment? Certain environments are low diversity environments (for instance, the human vagina) and will saturate pretty quickly. So, if this is a well known environment or has been reported in the literature before, I'd take a look at where their diversity is.
Hi Justine, thank you for getting back. I have exactly 6399 and did not rarefy. I have been advised not to for now. The environment is a peatland so very acidic anaerobic soil. Could it be that when i do my alpha diversity they are clustering together?
By observed OTUs, do you mean te count in the table summary, or from the diversity command?
If it's the former, what do you see in other papers with similar sample size, sequencing depths, and environments? What is the point of reference? Again, keep in mind that your observed features are going to be a function of sequencing depth, bioinformatics, sample size, and environment. I'd look at the paper on denoising I linked above, for example, for one expiation.
If it's the later, you need to rarify because a pure richness metric is super sensitive to sequencing depth and rarefaction is the current best practice (best of possible evils) to deal with the issue. You have a little bit more space with something like shannon, but I think currently, we still recommend rarifying for traditional diversity metrics.
Hi Justine, i mean observed OTU when i run the estimate richness command in phyloseq. There is only a few hundred per treatment even though my OTU table has over 6000 features.
Thanks for the clarification. I would not expect every feature to be present in every sample. My experience is that there if often a power law relationship between the number of samples in which an organism appears where 80% or more of features are only seen in 10% of samples or fewer (although this is primarily in free living organisms). So, it makes sense to me that not every feature appears in every sample.