Filter feature table with file listing taxa of interest

Hi,

I’m wanting to filter my feature table to only contain ~50 specific taxa and would like to use the full taxonomy string (i.e. k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Corynebacteriaceae;g__Corynebacterium). What’s the best way to do this? The taxa filter-table command is not liking these full strings separated by commas so it’d be nice if I could just supply a txt file listing these full taxonomic strings, but don’t see that this is an option. I would like to include the full taxonomic strings because some of them are unspecified at lower levels (i.e. k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__;g__). I could do a series of steps where I select the ones that are in k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__ and then from there, the ones that are g__, but would prefer to just do this all in one go.

Thanks,
Caitlin

That would be the go-to command. What do you mean by “not liking”? If you can share the error message, that would be the best starting point. You could try enclosing each taxonomy string in single quotes, in case the semicolons are causing problems.

I am maybe not following. You could just use “Actinomycetales” as a term to get everything — including both fully specified strings and those with ambiguous family and genus classifications (since the f__ and g__ are artifacts of the greengenes database, and indicate classification to an ambiguous family, genus, etc)

Let me know and we can sort out the best solution for you!

Enclosing taxaonomy string in single quotes as you suggested resolved that issue, so guess it was just the semicolons, thanks!

I mean I only want to include those bacteria with ambiguous family and genus within the order Actinomycetales. So if there were …o__Actinomycetales;f__Corynebacterineae;g__ I would not want these.

I am now getting a new error where filtering is resulting in an empty feature table, which does not make sense since using barplot with that feature table and the same taxonomy file shows that the taxa I wish to include, in addition to many other, do exist in the samples. Any idea what might be occuring with this? I have added “–p-mode exact” hoping that would help, but get the same error. Below is my command, with a reduced number of taxa just for the sake of space:
(qiime2-2018.8) [[email protected] FIV]$ qiime taxa filter-table --i-table vsearch_uchime_on_derep/vsearch_closed_99/clustered_table_filtered_100718/filtered_table.qza --i-taxonomy 99_taxonomy_FIV.qza --p-include ‘k__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__Porphyromonadaceae;g__Porphyromonas’,'k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pasteurellales;f__Pasteurellaceae;g_’,‘k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Moraxellaceae;g__Enhydrobacter’ --o-filtered-table Misc_feature_tables/core_microbiota_only.qza --p-mode exact_
Plugin error from taxa:

_ All features were filtered, resulting in an empty table._

Debug info has been saved to /tmp/qiime2-q2cli-err-f8wc1mt8.log

And the contents of the temp log:

Traceback (most recent call last):
_ File “/software/hprc/Anaconda/3-5.0.0.1/envs/qiime2-2018.8/lib/python3.5/site-packages/q2cli/commands.py”, line 274, in call_
_ results = action(**arguments)_
_ File “”, line 2, in filter_table_
_ File “/software/hprc/Anaconda/3-5.0.0.1/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 231, in bound_callable_
_ output_types, provenance)_
_ File “/software/hprc/Anaconda/3-5.0.0.1/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 362, in callable_executor_
_ output_views = self.callable(**view_args)
_ File "/software/hprc/Anaconda/3-5.0.0.1/envs/qiime2-2018.8/lib/python3.5/site-packages/q2_taxa/method.py", line 93, in filter_table
_ raise ValueError("All features were filtered, resulting in an "_
ValueError: All features were filtered, resulting in an empty table.

Any ideas?

Edit: ignore dashes around the contents of the error message and tmp log - tried to italicize but didn’t work out too well :disappointed_relieved:

For that specific case you can do something like --p-include 'o__Actinomycetales;f__;' and it would grab everything that contains that substring. Just make sure you don’t use “exact” mode.

Uh oh. Note the barplot may not necessarily contain the same exact feature names, since it will fill empty levels with underscores (e.g., something classified as “k__Bacteria” will appear as “k__Bacteria;;” if you display at level 3. From your code snippet it looks like that is not the problem, but just want to point that out.

The best way to make sure your search terms match what you have in your taxonomy file would be to export that file and confirm that those strings are present.

“exact” will not help since it makes the search more strict.

Try exporting your taxonomy file and let us know what you find!

You were right, the barplot did not contain the exact feature names.
I looked in my taxonomy file and noticed that there are spaces between taxonomic levels (like o__Actinomycetales; f__Corynebacterineae; g__) so added the spaces and it worked! I thought this was kind of weird since this was just an imported version of the greengenes taxonomy, but looks like this is how that taxonomy file is as well, which I had never noticed because I always pay attention to the taxa names from my summarize taxa (now barplot) output.

Just for some context for anyone that may run into this issue in the future, I had identified “core microbiota” in my control samples, and was then filtering my original feature table that included all samples to include just these taxa. So I had gotten the list of taxa names from the visualization.qzv output from the core-features command, which also has the feature names as would be shown in the output of a barplot, which is not correct for filtering based on the taxonomy files, which has the spaces.

Thank you!!

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.