After I created my core-features qzv, I've downloaded the feature list TSV file at 90%. Now I don't understand what the numbers associated with each feature at each given percentile represent?
I've attached this feature list as a TSV in case that helps to answer my question.
core-features frogs only -0.90.tsv (951 Bytes)
Thanks in advance.
Could you please also attach the original QZV? Thanks!
It looks like the explanation is in the QZV:
Feature list: The list of “core” feature IDs and their total percentile frequencies.
So the values in these TSVs are the percentile frequencies of those core features across all samples (the %s here are percentiles, not % samples threshold for defining core features, as in the visualization itself).
D_0__Bacteria;D_1__Planctomycetes;D_2__Phycisphaerae;D_3__Tepidisphaerales;D_4__WD2101 soil group;D_5__uncultured bacterium;D_6__uncultured has a median frequency of 40.0
I hope that helps clarify!
I still don’t quite understand, I’m sorry! I’m new to this. I’m confused with the two frequencies/percentages. Using your example, is it true to say that
D_0__Bacteria;D_1__Planctomycetes;D_2__Phycisphaerae;D_3__Tepidisphaerales;D_4__WD2101 soil group;D_5__uncultured bacterium;D_6__uncultured shows up 40% on the time in half of my samples?
Not a problem
This is the 90% core features file, so that species is present in 90% or more of your samples.
That feature’s median frequency is 40.0, meaning that in 50% of your samples this feature is detected at a frequency < 40 sequences; in the other 50%, the frequency is > 40 sequences.
So the percentages in the visualization are the core feature thresholds (the % of samples that must contain a feature for it to be considered “core”)
The percentages in the TSV are the
percentiles of feature frequency across all samples.
Makes sense, thanks so much!
August 24, 2018, 3:14am
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.