In order to measure relative abundance of taxa, should we normalize relative abundance by 16S copy number? Because if we don’t, we are actually not measuring the abundance of taxa, but instead the abundance of their 16S genes, right?
QIIME 1 refers users to use PICRUSt’s normalize_by_copy_number.py, but why isn’t this incorporated into the standard QIIME1/QIIME2 pipeline? Is it because this normalization is only applicable to closed-reference OTU tables (i.e. genomes with known 16S copy number), and in QIIME 2 we use open-reference? In other words we can’t normalize some of the taxa because we don’t know their 16S copy number (i.e. some genomes in taxa have not been sequenced), so we’d rather not normalize any?
I’m a bit surprised I can’t find this question in this forum yet.
Hi @jjmmii!
This is indeed a pertinent question which all molecular ecologists come across; but, it seems there is no plausible solution to this because of paucity of information on the copy numbers taxa-wise (as far as my knowledge goes). However, other experts in the forum may throw some light on this issue which I am too facing when peer-reviewers raise such questions. Lets hope one of our QIIME2 experts provides us with some solution.
Hi @jjmmii,
I think that @bsen2018 hits the nail on the head here :
Copy number variation can be quite variable even within a species, hence the best we could do is provide an estimate (which some do, and some methods do exist that could be used for this purpose). When your sequences are not classified to species level, the problem becomes even more complicated and predictions would be even less accurate.
So your inference is correct:
PICRUST is based on closed-reference alignments to 16S rRNA gene sequences from sequenced genomes, so the copy number predictions at least have a little more merit to them.
At the end of the day, 16S rRNA gene copy numbers are usually not too high, and hence these will only distort relative abundances slightly (i.e., less than an order of magnitude). So in my opinion this transformation is not as critical as in, e.g., fungal ITS sequences (in which copy number varies over several orders of magnitude!), but I know others will disagree with me there!