How to read core-feature TSV

After I created my core-features qzv, I've downloaded the feature list TSV file at 90%. Now I don't understand what the numbers associated with each feature at each given percentile represent?

I've attached this feature list as a TSV in case that helps to answer my question. core-features frogs only -0.90.tsv (951 Bytes)

Thanks in advance.

Hi Alan,
Could you please also attach the original QZV? Thanks!

Here it is frogs-only-core-microbiome.qzv (231.9 KB)

Thanks

It looks like the explanation is in the QZV:

Feature list: The list of “core” feature IDs and their total percentile frequencies.

So the values in these TSVs are the percentile frequencies of those core features across all samples (the %s here are percentiles, not % samples threshold for defining core features, as in the visualization itself).

For example, D_0__Bacteria;D_1__Planctomycetes;D_2__Phycisphaerae;D_3__Tepidisphaerales;D_4__WD2101 soil group;D_5__uncultured bacterium;D_6__uncultured has a median frequency of 40.0

I hope that helps clarify!

I still don’t quite understand, I’m sorry! I’m new to this. I’m confused with the two frequencies/percentages. Using your example, is it true to say that

D_0__Bacteria;D_1__Planctomycetes;D_2__Phycisphaerae;D_3__Tepidisphaerales;D_4__WD2101 soil group;D_5__uncultured bacterium;D_6__uncultured shows up 40% on the time in half of my samples?

Not a problem


  1. This is the 90% core features file, so that species is present in 90% or more of your samples.
  2. That feature’s median frequency is 40.0, meaning that in 50% of your samples this feature is detected at a frequency < 40 sequences; in the other 50%, the frequency is > 40 sequences.

So the percentages in the visualization are the core feature thresholds (the % of samples that must contain a feature for it to be considered “core”)

The percentages in the TSV are the percentiles of feature frequency across all samples.


Makes sense, thanks so much!

