OTU picking questions

benjjneb · April 19, 2017, 4:13pm

Hi Bing,

On 1: The "100% OTUs" output by dada2 (I call them "sequence variants" or SVs) are analyzed much like OTUs. However, it is likely that you will actually find fewer SVs than you previously did OTUs. While there are multiple SVs lumped together within some 3% OTU, the dada2 method has a lower false-positive rate than the most common OTU methods such as uclust or average-linkage clustering, and in general I have seen a significant reduction in the total number of features.

On 2: If the biological phenomenon of interest is at higher taxonomic scales, it is of course useful to analyze those data at those higher scales (eg. perhaps you are interested primarily in the ratio of Bacteroides to Firmicutes). In that case SVs can be grouped together on the basis of taxonomy (how I usually do it) or used as input to an OTU method with some pre-determined threshold. I'm not sure if that type of OTU picking is implemented yet in QIIME2, although one of the Q2 experts can clarify there. I would recommend starting at the most-resolved level (SVs) unless you have prior knowledge that the higher taxonomic levels are where the actions is though, as there can be significant functional differences between bacteria with similar 16S sequences.

On 1 again: There are some qualitative advantages of SVs over OTUs that we think are important to be aware of in the areas of reproducibility, reusability and comprehensiveness. We have posted a preprint outlining these arguments that may be worth reading and/or sharing with your collaborator: http://www.biorxiv.org/content/early/2017/03/07/113597