Number of features after dada2

I run dada2 in a sample of 16S data from the American Gut (n=395). The raw data is 150bp and single-ended. I set the trim as 0 and trunc as 115. I got +6000 features. Is that possible to have such a large number of features?

table-agp-395.qzv (748.6 KB)


Yes! In this paper, a data set from the TARA Oceans project, which contains ~766 million reads, was processed with dada2 into 107,868 features.

Keep in mind that dada2 is not clustering OTUs at 97%. Instead, it’s trying to preserve all real, observed Amplicon Sequence Variants (ASVs). Because two ASVs could be closer than 97% similar, it makes sense that there could be many features detected.


