Maybe this already exists, and I just haven't seen it in the user docs, but when running qiime taxa filter-table
(and other filtering commands), it would be nice to know what, if anything, was filtered. Since there is no log of what was filtered (as far as I can tell), the user must figure out what was filtered by indirect means (eg., comparing taxa barplots for pre- and post- taxa filtering). This is a bit cumbersome and could lead to incorrect filtering, especially if the user needs to filter out many taxa. If the log could be imbedded in the post-filter artifact, then this should help the user track what happened in >=1 filtering steps.
Thanks @nick-youngblut! This is a very interesting idea. There are a number of indirect ways to figure out what was filtered, and the easiest would be to run filter-table
with the include
parameter instead of exclude
but pass the same string of taxa; for example (tweaking the example provided in the filtering tutorial):
qiime taxa filter-table \
--i-table table.qza \
--i-taxonomy taxonomy.qza \
--p-include mitochondria \
--o-filtered-table table-of-mitochondria.qza
You can then use the filtered table to generate barplots or heatmaps, collapse to view the taxonomy annotations of all filtered features, et cetera. This would tell you what was filtered.
Of course, if you just want a sanity check to be sure that all mitochondria (or whatever) were filtered out, you can also use the same approach on the feature table output by filter-table --p-exclude
to ensure that taxon X is absent (e.g., an easy automatable approach would be to collapse the feature table on species-level taxonomy, export to biom, convert to tsv, and grep for taxon X (or egrep for taxa x, y, and z), then kill the job if the output is not empty)
But perhaps having an action in feature-table
that compares two feature tables and outputs the symmetric distance (features/samples in either table a or b but not both) could have more general uses. Let's see what others think.