exporting table to filter out counts with python

I would like to do the following:
filter out features that are in counts of 10 or less IN a sample (rather than across samples). I have posted about this previously and one recommendation was to export my feature table and filter in python. I am wondering how best to do this.

My python is basic so I was wondering if I can get help with this. Would my script below make sense?

qiime tools export
--input-path table.qza
--output-path exported-table
biom convert -i exported-table/feature-table.biom -o exported-table/feature-table.tsv --to-tsv

import pandas as pd
df = pd.read_csv("exported-table/feature-table.tsv",sep='\t')

df[df <11] = 0

Thank you so much!

That's making sense to me.
You should try it.
If you will get errors while reading table, try slightly modified code:

import pandas as pd
df = pd.read_csv("exported-table/feature-table.tsv",sep='\t', skiprows=1,index_col=0) #skip first row (#constructed from biom), set feature IDs to be an index to avoid format issues. 

df[df <11] = 0
1 Like

Thanks so much @timanix! Worked perfectly.
How can I convert the filtered_table.tsv back to a feature table. I'm guessing I can do it in two steps
1- first to convert it back to a biom file (not sure how to do this)
2- convert the biom file to a feature table with below script:

qiime tools import
--input-path feature-table.biom
--type 'FeatureTable[Frequency]'
--input-format BIOMV100Format
--output-path feature-table-1.qza

Thanks again!

For this, follow the instructions as described here.

1 Like

Thank you!
Although I'm not sure what to use for the 'input format' for the qiime tools import command - BIOMV100Format or BIOMV210Format? I tried reading the Biom file format docs but still not sure.
Any ideas which would I should choose?


Usually BIOMV210Format works for me without any issues


If you're still looking for help with this, I've made command line / python toolkit that has this functionality which you can easily install with conda in your Qiime environment!

Check out the Per Sample Filtering section, where you can set a single integer level to filter at within each sample, our input a .csv with unique filtering levels for each sample.


Thanks so much @Nick_Gabry !