feature-table counts different when exported as tsv file

Hi all,

I tried to export the feature-table which was filtered using below parameters in tsv format using 'qiime tools export' and "biom convert' command.:

qiime feature-table filter-samples
--i-table table.qza
--p-min-frequency 500
--o-filtered-table sample-frequency-filtered-table-EMP.qza


qiime feature-table filter-features
--i-table sample-frequency-filtered-table-EMP.qza
--p-min-frequency 2
--o-filtered-table feature-filtered-table.qza

When I look at the converted tsv file on the terminal, there seems to be many zeros and I can't see the original counts. The size of the tsv file is 2.56 GB and I cannot upload it or the feature-filtered-table.qzv file here.

I'm providing info from the overview tab in feature-filtered-table.qzv present after filtering:

Frequency per feature

Minimum frequency 2.0
1st quartile 6.0
Median frequency 17.0
3rd quartile 72.0
Maximum frequency 3,314,206.0
Mean frequency 1,095.0084638368019

Frequency per sample:

Minimum frequency 570.0
1st quartile| 18,600.0|
|Median frequency 33,947.0
|3rd quartile 54,297.0|
Maximum frequency 2,242,168.0
Mean frequency 64,707.166818873666

As far as I understand, any feature with zero counts are not present in the table after filtering (this is what I wanted from filtering). I want to know if I'm wrong or not!

And the next thing is, I cannot see the exact feature counts as I see in the feature-table.qzv in the tsv file. There are so many zeros. I will upload an image of my terminal with the file opened (shows only few line)

I'm not quite sure what's happening.

Any help or suggestions are much appreciated.

Many thanks in advance.

Hi @uth,

I think you’re conflating sparsity with abundance. So, you filtered out any sample with less than 500 counts (good, possibly low, but you do you) and then any feature with less than 2 counts. This means that if you have 100 samples, your feature may have 2 counts in one feature and zeros in the other 99. If you look at your frequency discription

You can see that 72 counts in the 3rd quartile : so 3/4 of your features have less than 72 counts. If you want to reduce the number of features with zeros, you either need to set the threshhold higher or threshhold based on the number of samples where the organism is present. (Prevalence is my preference).

However, some of this is the nature of microbiome data, particularly in free living organisms. Microbial time is a lot shorter than macroscale time, where their generations might be less than a day to a week for many organisms compared to your 30 year generation. That gives a lot of space for evolutionary drift and natural selection! In the microbiome of free living organisms, sparsity is a reality (and an important part) of the data. …That said, it’s hard to do statistical tests on something that’s present in one sample. My empirical and not fully benchmarked experience suggests that you need about 10% for presence/absence testing.



Thank you so much for this very clear explanation. I analysed my table with better filtering and I'm happy with the final result.

I was wondering If it's okay to add a pseudo-count using 'qiime composition add-pseudocount' plugin to my table to avoid any problems that might occur during cooccurrence network analysis due to sparsity. My table is refiltered-for-sample-frequency-EMP.qzv (902.9 KB) with the following numbers in the table summary.

Number of samples 1,892
Number of features 287
Total frequency 48,270,266

Frequency per sample

Minimum frequency 5,004.0
1st quartile 10,578.75
Median frequency 17,613.5
3rd quartile 32,166.25
Maximum frequency 760,610.0
Mean frequency 25,512.825581395347

Frequency per feature

Minimum frequency 5,380.0
1st quartile 29,760.0
Median frequency 69,527.0
3rd quartile 155,990.0
Maximum frequency 3,211,135.0
Mean frequency 168,189.08013937282

Also, I'd be so grateful, If you could let me know what exactly I can see with 'presence/absence' as you mentioned in the previous comment.

Thank you so much again.


HI @uth,

I’ve mostly do my co-occurance within q2-SCNIC, and they do internal normalization. I would check the normalization assumptions and requirements of your co-occurance method.



Thank you so much! I will keep your suggestion in mind while doing my analysis.

Many thanks again!


1 Like