After I created my core-features qzv, I've downloaded the feature list TSV file at 90%. Now I don't understand what the numbers associated with each feature at each given percentile represent?
Feature list: The list of "core" feature IDs and their total percentile frequencies.
So the values in these TSVs are the percentile frequencies of those core features across all samples (the %s here are percentiles, not % samples threshold for defining core features, as in the visualization itself).
For example, D_0__Bacteria;D_1__Planctomycetes;D_2__Phycisphaerae;D_3__Tepidisphaerales;D_4__WD2101 soil group;D_5__uncultured bacterium;D_6__uncultured has a median frequency of 40.0
I still don’t quite understand, I’m sorry! I’m new to this. I’m confused with the two frequencies/percentages. Using your example, is it true to say that
D_0__Bacteria;D_1__Planctomycetes;D_2__Phycisphaerae;D_3__Tepidisphaerales;D_4__WD2101 soil group;D_5__uncultured bacterium;D_6__uncultured shows up 40% on the time in half of my samples?
This is the 90% core features file, so that species is present in 90% or more of your samples.
That feature's median frequency is 40.0, meaning that in 50% of your samples this feature is detected at a frequency < 40 sequences; in the other 50%, the frequency is > 40 sequences.
So the percentages in the visualization are the core feature thresholds (the % of samples that must contain a feature for it to be considered "core")
The percentages in the TSV are the percentiles of feature frequency across all samples.