- The International Nucleotide Database Collaboration have developed a standardised missing/null value reporting language, which says that missing values should be noted as: not applicable, not collected, not provided, restricted access; which means that numeric columns will have text if any value is missing and, in general, this will be the case as in an experiment we need blanks and controls that do might not have a valid column; for example timepoint_numeric (a simple time point representation).
- the #q2:types and #q2:units are not part of the INSDC so all data stored or downloaded from there will not have them and adding them correctly might be a daunting task
Possible solutions and comments:
- q2-metadata is aware of these null values and they get ignored to define the correct #q2:types. Problem here is that if the standard changes.
- q2-metadata only uses the values that q2-X (other plugin) is gonna operate on. In a situation like in the example in problem 1, we could compute the #q2:types on the subset of samples that q2-X will operate on; in other words, if my distance matrix doesn’t have any control/blank samples it should compute that #q2:types is numeric. This will mean that the plugins will need to figure out which samples are present in their input, pass it to q2-metadata so it does its magic.
Sorry if confusing and/or long, just trying to be as clear as possible.