How to remove unclassified features from a table?

Sparkle · December 3, 2019, 11:01am

Hello! I'm trying to use the plugin qiime taxa filter-table to remove any unassigned taxa from my FeatureTable.
I'd like to remove anything not leading to an exact taxonomic classification and looking like g__, f__, etc, at any taxonomic level..

To put it short, I'd like to remove anything that looks like this:

https://i.ibb.co/Jr1r3bK/Unassigned.png

I'm quite confused here because I was unable to find a fast and effective way to perform it.

I'm interested in doing that in order to build a more readable taxonomic tree in the further steps, and same for the barplots (which I'm going to analyze only at phylum, family and genus level).

Any suggestions here? Thanks in advance!

timanix · December 3, 2019, 11:28am

Hi!

Right way to do it:
Filtering data — QIIME 2 2019.10.0 documentation
Not exactly right way (I am doing it in this way, not sure why)
Open taxonomy file taxonomy.qza, filter columns to contain only ASVs, assigned to only bacteria level or unassigned, save this IDs in a new exclude.tsv file and run:
```
 qiime feature-table filter-features \
     --i-table table.qza \
     --m-metadata-file exclude.tsv \
     --p-exclude-ids \
     --o-filtered-table filtered_table.qza
```

Sparkle · December 3, 2019, 11:40am

Hello, thanks for your answer! I had already checked that tutorial, but it seems to filter the unclassified features of just one level at once, like here (using 'exclude' instead)

qiime taxa filter-table
--i-table table.qza
--i-taxonomy taxonomy.qza
--p-include p__
--o-filtered-table table-with-phyla.qza

I was interested in removing anything unclassified instead, and wondering if there was one easy way to do that without performing a lot of sequential commands (also because the syntax may vary).

Haven't tried your method yet!

jwdebelius · December 3, 2019, 11:42am

My suggestion would be to annotate your database to describe the inherientence rather than discard those sequences. If you make that choice, you're biasing your data and potentially dropping a lot of interesting information! For example, you'll drop all your E. coli if you filter at genus level because E. coli can't be resolved from Shigella with 16s rRNA.

Best,
Justine

Sparkle · December 3, 2019, 11:46am

Hi, thanks for your suggestion! You're definitely right, and to take this into account I'm looking at different taxonomy levels, and comparing them. What can't be assigned will appear in one upper level, won't it?
However, I was also interested in whatever could be assigned for sure, besides the original plots including all the features, to obtain a more compact view.

jwdebelius · December 3, 2019, 12:35pm

Hi @Sparkle,

Yes, but it will still skew your composition and could change it majorly! This kind of filtering is common in the literature and leads ot all sorts of weird problems in downstream analyses (past visualization).

My suggestion for plotting is to limit your barchart space to the most abundant X groups, and then annotating them manually is easier! that way, you don't lose any data or alter the composition and your plot is readable for people who can see X colors (16 tends to be the upper limit but most colormap break at like 12).

Best,
Justine

Sparkle · December 3, 2019, 2:30pm

It makes sense!
How do you keep only the most abundant features, then, like the ones showing abundances above a given threshold?

I was thinking of using qiime feature-table filter-features

And are there any suggested criteria for the threshold?
Like, removing anything below 300 for phyla, 200 for families and 100 for genera.

Thanks in advance!

jwdebelius · December 3, 2019, 3:44pm

HI @Sparkle,

It depends on what you're plotting. For a bar chart, show no more that 10 features because that's how many most of my collaborator's eyes can distinguish.

If I'm working with a heat map, I tend to do a joint filtering but recommend a prevalence of at least 10%. Im maybe a bad example here becasue I do my filtering outside of QIIME via the python API because I can better get I want in biom.

Best,
Justine

Sparkle · December 3, 2019, 3:51pm

My first idea was the same (excluding anything below a certain percentage on the total number of reads in that sample), but apparently qiime feature-table filter-features only allows you to filter features by specifying an absolute value rather than a percentage! That's why you did the filtering out of QIIME2?

--p-min-frequency INTEGER
The minimum total frequency that a feature must have
to be retained.

jwdebelius · December 3, 2019, 4:21pm

I do filtering outside of qiime because I actually filter where its present in X% with at least a relative abundance of Y. Its been on my list to maybe push to the feature table plugin, but my list is longer than my arm at this point?

But, again, for bar charts, I just do the top 5 - 12.

system · January 4, 2020, 2:51am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.