After dada2 denoising ，i get a lots of features(about 40000)，but the frequency in all samples is very low,many features are observed in only one sample，its abnormal，i think it a least have key features that existed in 50% samples.but its not.I don’t know where the problem is.
This sounds relatively normal to me. I’m not sure how many samples or what kind of depth you have, but my experience in has been that there’s often a power law distribution of features where a few features are common, and then they become less common from there. Sparsity is a pretty common observation (feature, bug, we haven’t decided) with 16s sequencing. There are a couple of potential factors at play. First, there’s natural variation. People may just have different microbiomes based on random evolution, selective pressure, vertical transmission, medication, etc etc etc. There’s also sampling issues: low abundance microbes and/or low prevalence microbes are likely harder to detect even if they’re common overall. Finally, even though DADA2 should be less prone to suprious microbes than other techniques, you might also have some technical biases where an error occured early in the PCR cycle and amplified past the detection threshhold. (I’m not sure I’ve seen a publication on it, but annecdotally, this has happened at my lab.)
There are a couple of ways we deal with these. I might check taxonomic assignments. ASVs that don’t get assigned at phylum level (i.e. missing ;c__) may be spurious or artifacts, I might filter or exclude these organisms. It may cut down on your rare features, but it may tell you that they’re actually bacteria.
A second option is to consider diversity metrics that place less emphasis on rare features. Pure richness metrics (observed features, Faith’s PD) are more sensitive to rare features, while evenness metrics (i.e. Pielou’s) are less so. Similarly, you’ll find unweighted beta diversiyt metrics (i.e. Jaccard, unweighted UniFrac) are more sensitive to rare features than weighted ones (weighted UniFrac, Bray Curtis).
When you do differential abundance, you may choose to filter out those low prevelance features, simply based on the assumption that if they’re not common, they may not be powered.
You can poke around here more, there’s been a lot of discussion about these kind of topics!