Filtering taxonomy tables

Thanks for the quick reply! Along the same lines as my original post, I haven’t been able to find a method in qiime2 for filtering taxonomy tables (artifacts). I’ve been able to filter a sequence variant feature table & it’s corresponding representative sequence artifact (I had to convert the feature table to biom format, then convert that to a tsv table, and then use that to filter the sequences). However, I don’t know how to filter the taxonomy artifact. I ask this because I’m planning on importing the data into the phyloseq R package, and I believe the package’s import functions require that all tables (and the tree) match in OTUs and samples, so I need the taxonomy table to match the OTU table and tree.

Hello again Nick,

So all the qiime 2 plugins are listed on this page.

Sounds like one of these plugins would let you filter by taxa:

As for Phyloseq, it requires that some of the OTU names and sample names match, and any that don’t are dropped. So you can import with phyloseq without worrying about this. :slight_smile:

1 Like

Hi @nick-youngblut,

Does @colinbrislawn’s suggestion help? particularly his comment that:

(thanks @colinbrislawn for pointing that out)

If not, could you please clarify what you mean by “taxonomy table”? Is this a feature table (FeatureTable[Frequency]) that contains taxonomic assignments as feature IDs, e.g., the output of taxa collapse? Or is this a FeatureData[Taxonomy] artifact, e.g., the output of q2-feature-classifier? If the latter, we do not have an action for filtering FeatureData[Taxonomy] artifacts because there are no downstream QIIME 2 actions that require this.

Please let us know!


The taxonomy artifact is just a FeatureData[Taxonomy] type. I believe that the qiime taxa plugin will only filter a FeatureTable[Frequency] (if using qiime taxa filter-table) or filter FeatureData[Sequence] (if using qiime taxa filter-seqs).

While filtering the taxonomy table doesn’t appear to be necessary for importing data into phyloseq (extra taxonomy features are automatically filtered out), it would be helpful to filter out the FeatureData[Taxonomy] artifact in order to check that the taxa that I wanted removed are indeed removed. Right now, I don’t know of any way to check that the taxa were removed besides running qiime taxa barplot with the post-filter FeatureTable[Frequency] artifact and then looking to see if the taxa that I wanted to filter are indeed filtered. It would be easier to run qiime metadata tabulate on a post-filtered FeatureData[Taxonomy] artifact and then check to see if the taxa are still in the taxonomy table. However, there doesn’t seem to be any way of filtering the FeatureData[Taxonomy] artifact.

An even easier way of checking that taxa are indeed filtered would be to have qiime taxa filter-table generate a log of what taxa, if any, were actually filtered. However, as far as I know, qiime taxa filter-table currently doesn’t generate a log of what was filtered (maybe I’m just missing this in the docs). I’m worried that if I don’t type in the --p-exclude variables correctly (eg., --p-exclude D_0__Eukaryota,D_4__mitochondria,D_2__Chloroplast), then I won’t notice that these taxa weren’t filtered until much later in my processing pipeline.

1 Like

Thanks for clarifying @nick-youngblut!

what about exporting the taxonomy file (to generate a TSV) or if using artifact API view as a series? An automatable process would be to export to TSV, then grep (or egrep) with your filter terms, and raise an error if any strings matching x, y, or z are detected.

Would that solve your specific need? I hope that helps!

Thanks for the suggestion! I was thinking of doing what you suggest: exporting the FeatureData[Taxonomy] artifact and just grepping for the taxonomic classifications that I had filtered (no hits if they were indeed filtered. However, do do this I need to first filter the FeatureData[Taxonomy] artifact to match the filtered FeatureData[Frequency] artifact. Otherwise, grepping the exported taxonomy table will generate hits for “mitochondria” or whatever taxa that I wanted to filter.

I could export the taxa-filtered FeatureData[Frequency] to a biom, then convert to .tsv, then find the intersection between that .tsv and the taxonomy.tsv. However, this is a lot of steps just to check that taxonomic groups have been removed.

As far as I know, there’s currently no way of filtering the FeatureData[Taxonomy] artifact in qiime2.

1 Like

Thanks @nick-youngblut, you are correct! my “solution” would require the very same filtering method that you are proposing!

And you are correct, there’s currently no way to filter a FeatureData[Taxonomy] artifact directly in qiime2.

What about collapsing the filtered feature table using that taxonomy file, export, then grep the taxonomy strings in that file? Still a few steps but simpler than finding the intersection.

Sounds like this might be a useful action for viewing filtering results, though the output wouldn’t really have any practical use (yet, as far as I can tell) in downstream qiime2 actions. I have raised an issue to track this. Thanks!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.