Should we only use metrics that use only presence/absence information, such as richness and unweighted UniFraq?
As the quantification bias is uniform across all samples, I suspect that we shouldn’t worry about the beta diversity metrics, but I’m not sure about this.
Do you have an actual citation? That links to Robert Edgar’s website, not a published article (and currently the website appears to be down).
I do not mean to dismiss the validity of his findings — indeed we see frequently that amplicon frequencies and metagenome composition (as that appears to be what he is actually measuring as a ground truth, not actual copy numbers) are subject to different biases and can correlate poorly. However, many others have shown that amplicon frequencies can correlate strongly with expected composition and actual copy numbers, particularly when good experimental protocols and controls are put in place. There is a long literature record here, which Robert is contributing to, not discounting in a single stroke.
In brief, use appropriate protocols and controls if you want to replicate community composition accurately.
Depends. Did you apply appropriate controls and careful protocols?
These methods are related to but distinct from what Robert’s data show, though they can be influenced by many of the same experimental biases. And they have their own set of other issues — and some outstanding analytical questions.
These experimental biases are mostly going to introduce noise, resulting more likely in false negative errors, rather than false positives.
So even if you don’t get a absolutely accurate answer (how many species are living in this gram of soil anyway?), these measurements are still largely relative (how many species do I detect when the same experimental protocols, biases and all, are applied?) and can give useful results.
no.
Yep. Beta diversity analyses are going to be much less sensitive to these biases than alpha diversity, so you can worry less in that regard.
To my knowledge, the bias in the observed abundance in 16S data comes from differences in the copy number of the 16S and differences in primer affinity. Can these biases be supplanted with careful experiment design?
Yep, these problems have been known about and acknowledged as biases in molecular methods for a few decades now.
Yep.
Copy number is not a problem unless if you really care about precise cell count. Even then,
16S copy numbers usually only vary slightly between species and multiple copies within a cell don’t have major sequence heterogeneity (unlike, say, ITS, where it is a serious issue).
Copy number can be corrected for some extent when it is known, and there are some software tools out there to do this. This is of course limited by whether that species’ genome has been sequenced but this is a problem that becomes smaller each day.
Copy number will not really impact beta diversity comparisons (small effect, applied evenly across samples)
Primer bias can be partially addressed by using degenerate primers. Still not perfect, but it works.
Amplicon sequencing can be quite precise when controlled properly. E.g., check out this paper.
There are other problems with that pre-print, e.g., it is based on a very small sample size (two mock communities?) which probably says more about the quality of the test data than it does about the methodology in general.
So yes, these are issues, no, amplicon sequencing is not perfect, but no method ever is and biology is messy. We’re getting better every day.