After I created my core-features qzv, I've downloaded the feature list TSV file at 90%. Now I don't understand what the numbers associated with each feature at each given percentile represent?
I've attached this feature list as a TSV in case that helps to answer my question. core-features frogs only -0.90.tsv (951 Bytes)
Thanks in advance.
Could you please also attach the original QZV? Thanks!
It looks like the explanation is in the QZV:
Feature list: The list of “core” feature IDs and their total percentile frequencies.
So the values in these TSVs are the percentile frequencies of those core features across all samples (the %s here are percentiles, not % samples threshold for defining core features, as in the visualization itself).
D_0__Bacteria;D_1__Planctomycetes;D_2__Phycisphaerae;D_3__Tepidisphaerales;D_4__WD2101 soil group;D_5__uncultured bacterium;D_6__uncultured has a median frequency of 40.0
I hope that helps clarify!
I still don’t quite understand, I’m sorry! I’m new to this. I’m confused with the two frequencies/percentages. Using your example, is it true to say that
D_0__Bacteria;D_1__Planctomycetes;D_2__Phycisphaerae;D_3__Tepidisphaerales;D_4__WD2101 soil group;D_5__uncultured bacterium;D_6__uncultured shows up 40% on the time in half of my samples?
Not a problem
- This is the 90% core features file, so that species is present in 90% or more of your samples.
- That feature’s median frequency is 40.0, meaning that in 50% of your samples this feature is detected at a frequency < 40 sequences; in the other 50%, the frequency is > 40 sequences.
So the percentages in the visualization are the core feature thresholds (the % of samples that must contain a feature for it to be considered “core”)
The percentages in the TSV are the percentiles of feature frequency across all samples.
Makes sense, thanks so much!
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.