Filter feature table based on percent prevalent?


I am interested in conducting low-prevalence filtering (e.g., 0.01%) on my feature table before doing diversity metrics or downstream analyses. The reason is that I am not interested in the rare biosphere, and want to focus on dominant taxa between two groups of samples. I cannot figure out how to do this using the feature-table filter-features function.

Any help would be much appreciated. Thanks!

Hi @smreyes,
Unfortunately we don't currently have a way to filter based on percentages, only based on counts. This is something that we plan to add, and I made a note on our open feature request to indicate that this came up.

I realize that this doesn't get you exactly what you want, but one approach I often take to achieve a similar goal is filter features based on the number of samples they show up in. If you set this threshold high (for example, 50% of your samples) this will let you focus your analysis on features that are consistently present in your samples. Alternatively, you could try to choose a minimum count threshold based on the Frequency per feature or Feature Detail information in your feature table summary (i.e., the result of running qiime feature-table summarize). For example, if you want to focus on the 50% most abundant features across your samples, based on the following information from qiime feature-table summarize, you should set the median frequency (25) to be your threshold. (This plot is from the Moving Pictures tutorial data, so the vales are probably a lot lower than what you have.)

I hope this helps - sorry to not have a better answer for you right now.


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.