Quantification and raw OTU abundance comparison

Natalia_Bednarska · October 7, 2020, 6:58am

We’ve currently evaluated different microbial analysis software and wanted to get into quantification rather then qualitive measures of microbial diversity.
I wanted to ask you, what is the best method to assess/express number of OTUs of 1 phylum amongst multiple samples? I apply trimming reads, filtering based on number of reads and minimum percent from the median for QC prior to building OTU table. Would this be enough to obtain raw and reliable OTUs for one particular phylum/species and use the number as quantitative measure? Or do I need to normalize in addition for maximum number of reads per sample so that all the samples have equal number of reads? If so, how to do it?

lukasbeule · October 7, 2020, 7:41am

Hi @Natalia_Bednarska,
not sure if I get your question but you may check out Gloor et al. 2017.
I fully agree with Gloor et al.: microbiome data obtained by amplicon sequencing is not quantitative data because you are using endpoint PCR. When I started with microbiology, I started with real-time PCR and not amplicon sequencing. Working with soil, we sometimes obtain quantitative differences in microbial abundance (determined using real-time PCR) that span orders of magnitude among samples! Now, here is a little thought experiment:
imagine you have two samples and want to know the absolute abundance of Bacillus species. Sample A has 10 copies of 16S rNRA genes; Sample B has 100 copies of 16S rRNA genes (real-time PCR results). When you perform amplicon sequencing of bacterial 16S rRNA genes in the two samples, you get the following results: Bacillus species account for 10% of the reads in sample A and 5% of the reads in sample B. When you only have amplicon sequencing, you can only conclude that Bacillus species are 2-times more abundant in sample A than B. When you factor in the absolute abundance (as for example done here), however, you will notice that sample B has probably 5-times more Bacillus species than A.

Now one could argue that if I do not normalise my samples prior to amplicon sequencing (multiplexed Illumina run), the number of reads per sample will correspond to absolute abundance. I disagree here because this ignores PCR efficiency and the role of PCR inhibitors, which can vary from sample to sample (most users do not thoroughly test their samples PCR inhibition (we just published an article in which we used a real-time PCR-based DNA amplification inhibition test, if you are interested in that)). Furthermore, quantitative differences in PCR yield among samples will more or less disappear if your samples enter the plateau phase.

I hope that this is useful and not totally off-topic

--- Lukas

sbombin · October 7, 2020, 9:28pm

Sequencing method that you used might be important factor for calculating OTUs-phylum "richness" between samples. If you used Illumina platform, you might need to check for a cross-talk (Quality filtering of Illumina index reads mitigates sample cross-talk - PMC
Cross-talk).

And there are a lot of different methods for normalization or rarefaction reads count to calculate the relative abundance (UNBIAS algorithm)