Hello everyone,
Im not sure this is the best channel to ask my thing but maybe you can help me. Im trying to use the filter-features-conditionally option on picrust output ko-metagenome table. I want to retain the most frequent KO numbers in order to do a heatmap.
It is confusing to me because when I use the above command my table is not filtered at all, the resulting "filtered-ko-table.qza" contains exactly the same information that "ko-table.qza". On the other hand, when i use the below command, my table is empty. All the features are filtered out. I can see it with the feature-table summarize option and because in the first case when I try to draw the heatmap the table is too large and otherwise the plugin says that it can´t draw anything from an empty table (obviously).
Maybe I don´t really understand the plugin, I´m trying to retain the 10% most abundant features in my table, and I only need them to be very abundant in one single sample in order to compare with the others. I don´t know if picrust2 output tables have different format or the plugin is not reading well the table or what is happening there, but i definitively need your help.
Hi @vimh,
This action, as called in your second command, will retain all features that are present at least 90% abundance in at least 10% of your samples, so it's working as expected (it's unlikely that any features meet that criteria). Your first command will retain all features that are present at least 90% abundance in at least 0% of your samples - so that one should keep everything. Does that make sense?
I don't think we have an easy way to do exactly what you're looking for here. If you set --p-prevalence to one divided by your total number of samples, that should let you retain features based on abundance in one sample. You could then experiment with --p-abundance settings to try to keep the most abundant features.
Alternatively, if you're comfortable with Python programming, you can use the Python 3 API to load your feature table as a pandas.DataFrame and perform the filtering with pandas.
I'll keep this post queued in case other developers have ideas on how to achieve this that I'm not thinking of.
Hello Greg,
Thank you so much for your help. I wasn´t understanding perfectly how the plugin were working and was trying to fit the plugin to my needs. I will try to do this outside qiime, or calculating a threshold frequency in order to use filter-features with --p-where parameter filtering out frequencies below the threshold.
Effectively, any feature is too frequent to pass the 1% filter. I used --p-abundance 0.001 and the table was filtered.