What was the motivation behind the unconventional terminology in QIIME2? Why not just call an artifact a data file and a provenance a log? It takes a bit of getting used to and non-QIIME2 users might not know what you’re talking about. There’s nothing wrong with it per se I’m just curious as to why those terms were chosen?
Hi @SteveMcL,
The best answer I can give you is that many of the files/data used in QIIME 2 do not exactly have precedents and cannot be described neatly by pre-existing terms. So we needed to create a new vocabulary to adequately describe these.
An artifact is not simply a data file, and the provenance is not simply a log. (provenance is actually a more general term that others have used to describe different objects that have similar purposes, so that one is not new here.) Calling an artifact a “data file” could actually be confusing, e.g., one could then be forgiven for thinking that other data files could be used in QIIME 2 without importing as the appropriate artifact type. Artifacts and visualizations are of course subclasses of “data files” — i.e., they are files that contain data, but what data? How are these different from each other and from other data files? So the terms perform important functions for differentiating different types of files used/generated by QIIME 2, and for distinguishing these from non-QIIME 2 data files.
I recognize that this does increase the learning curve for those learning QIIME 2, but the outcome is worth it, as this terminology serves to clarify and distinguish different file types at the end of the day. We attempt to outline some of this terminology in the docs, e.g., here.
I really like this question. Why not keep it simple?
There is a president for using "unconventional terminology" to clearly differentiate your ideas from existing work and have the chance to establish new presidents. I'm thinking of how the GPLv3 uses the words "propagate" and "convey" to distance itself from other licenses and establish new legal presidents.
See these set of questions that are all about "unconventional terminology"
From the GPL FAQ:
- Is “convey” in GPLv3 the same thing as what GPLv2 means by “distribute”?
- If I only make copies of a GPL-covered program and run them, without distributing or conveying them to others, what does the license require of me?
- GPLv3 gives “making available to the public” as an example of propagation. What does this mean? Is making available a form of conveying?
Because they are artifacts!
Colin
I really like the arguments so far, but I do want to defend the term of "artifact" a little bit. Our usage isn't quite as "out-there" as it might appear initially.
Using "artifact" as the result of some process or methodology is relatively common when discussing software engineering and engineering methodologies. As a software engineer, it was one of the first words I personally reached for when talking about "some result of a process", it was just easier to say.
Looking at the etymology, it is pretty clear why it is a preferred term in software engineering:
In biology, obviously the goal is not to "make" anything as a result of observation, hence artifact has a negative connotation. But as soon as you apply a process in a computer, the results are entirely constructed, and so artifact isn't necessarily bad or good.
Even better, provenance is a real consideration for archeological/anthropological artifacts. Like everything in computing, this relationship was borrowed as analogy to describe digital objects and processes, and has been used by other systems that are aware of/or model provenance:
https://www.w3.org/2005/Incubator/prov/wiki/What_Is_Provenance
http://wf4ever.github.io/ro/2016-01-28/wfprov/#overview
Given the more general meaning of the word, it's not surprising that different systems have come to the same terminology naturally