Feature-table filter-samples removes taxonomy from biom table

gregcaporaso · September 26, 2017, 4:25pm

We had some discussion offline and in a recent developer call about including taxonomy in .biom files in QIIME 2, but have decided that that is not something we'll support. Here's the justification I gave @antgonza about this (there is some overlap with other content in this topic so just noting that the following wasn't originally written as a reply to this post, but we decided it might help to share this here):

@antgonza, following up on your question about supporting taxonomy in biom files in QIIME 2. The short answer is that we’re not going to support this in the FeatureTable semantic types. The reason is that this violates the core idea of the semantic type system, as the FeatureTable semantic type would no longer unambiguously describe a type of data (it would mean either a feature table, or a feature table with taxonomic annotations). This means that plugin developers couldn’t be sure what they were getting when they request a FeatureTable, which would set us up for a lot of QIIME 1-like problems (e.g., users getting traceback from methods, rather than error messages that can provide detail about what they did incorrectly and how to correct it). That becomes a problem that basic users probably can’t solve on their own (they need to post to the forum). There are a lot of other reasons to keep these data separate - we mentioned a few of these on the call. Your use case (as I understand it, avoiding having to add taxonomic information to a biom file following export from QIIME 2, when you’re developing an automated bioinformatics workflow that uses QIIME 2 and other tools) is straight-forward for an advanced user who would be developing that type of system (at worst, it’s 1-2 extra commands in your code).

Note that in your own plugins, you’re free to define a new semantic type (e.g., FeatureTableWithTaxonomy), which you could use in methods in that plugin or plugins that depend on it. That would no longer violate the idea of the semantic type system, since the type you define would unambiguously describe a type of data.

Also, I mentioned on the call that we support importing FeatureData[Taxonomy] from a biom file that has that information. You can do that as follows:

qiime tools import \
  --input-path my-file.biom \
  --output-path my-taxonomy.qza \
  --source-format BIOMV210Format \
  --type "FeatureData[Taxonomy]"