about the definition of "feature"

Hi Nicho,

Sorry for trouble, I am still puzzled by the definition of “feature”. In this discussion, when “–p-sampling-depth”, the value generally is the minimum of the sequence count in samples, or determined by sliding “table.qzv” window.

But in the course “fecal microbiota transplant”, we can see the sequence counts for each sample is low, when doing the alpha or beta analysis, the sampling depth might be ~1000 by sliding the “table.qzv”(10% data). the question is when I slided to choose the value, the “features” increased along with the “sampling depth”, from 12 to 99. seeing following graphs.

. Why are there so many features in all samples? and why the retained features increase when the sampling depth goes up (in my mind, the features should decrease if the sampling depth increase)? In an answer from “thermokarst”, he mentioned feature means features can represent ASVs, OTUs, Species, Metabolites, Proteins, whatever! So, the features are the “things” present in your samples..Anyway, I am trapped in this definition. Thanks a million!

Decen

I think that is the key to interpreting this. While a phrase like “1,452 features” might usually be interpreted as the number of unique features, I think this visualization is just sensitive to the fact that a feature can be any type of observation — sequences, metabolites, etc — and thus feature count is not always the number of unique features.

So with that in mind:

It’s the total number of (in your case) sequences that are present in the table when evenly sampling at that depth.

118 samples * 99 features/sample = 11,682 total features

same as above: you are increasing the number of features (sequences) retained in each sample, and so the total number of features (sequences) will always increase unless if sampling depth is high enough that you begin losing samples

I hope that clarifies!

2 Likes

Thanks a million! 谢谢!

1 Like