Hi @colinbrislawn,
Great point about terminology! We do have a short glossary that needs expansion (it does not contain the terms that you’ve listed), and could probably benefit from greater visibility. Let’s see what others say though — there may be another list of these terms somewhere.
If you have ideas for how to make this glossary more visible, we’d love to hear. And if you’d like to contribute via a docs PR, please jump right on in!
We tried to match entry-level statistics in this case, so the terminology was pretty intentional. Ultimately almost any term we use seems to be pretty well overloaded. Sometimes I do feel weird about Frequency though as it’s more like FrequencyForDepthMakingStatisticsReallyHard, the same problem holds for Counts too.
We definitely need more glossary terms, and probably something like a “QIIME 2 in 2 minutes” page.
I actually like the term Frequency, imho I think it is fairly intuitive. In addition, it is commonly used multiple other fields such as natural language processing, electrical engineering, …
If anything, this could serve as a bridge to other disciplines. So I vote to keep this as is.
I like the idea of having the glossary as a technical document for developers, so that we can write documentation and tutorials that implicitly teach standard vocabulary. Qiime 2 in 2 minutes is a perfect place to do this. What a great idea!
On Frequency
The use of Frequency is not standard, see this ABA guide, and the main wiki page which defines Frequency as observations over time… which could be analogous to observations of one feature over all observations in a sample… BecauseAllOurCountsAreBasicallyRelativeAnyways.
I was hoping to find a clear word, but they all have sticky connertations. I guess AbsoluteFrequency and RelativeFrequency would avoid this, but we know that our counts are not necessarily Absolute, so this wrong for other reasons.
I mean, I guess that’s fair, but I don’t really expect people to confuse signal processing with microbial ecology.
As to the behaviorist journal, it seems like the discussion centers more on what represents the statistical sampling event, e.g. it’s a rate for applied behavior analysis. This seems pretty consistent with the statistical definition, so I’m less certain what their aim is.
In all of these cases, frequency is still a number of observations per unit, whether that unit is time, area, population, or PCR+sequencing.
That definition of frequency still applies – here we are talking about the number of microbial occurrences within a single sample. It follows the same rationale why we can apply algorithms such as DESeq2 on our data in the first place (otherwise we can’t apply the Poisson on our data, since it, well, only operations on a single time unit).