Hello there!
I have recently started working with QIIME2 and so far the docs and forum have been great resources to resolve the problems that I encountered. However, now I seem to be stuck on a new problem that hopefully someone can help with.
I want to filter potential crosstalk features from my feature table before running further analysis. I have observed that ASVs originating from crosstalk between barcodes/samples on average have <0.05% relative frequency in a given sample in my sequencing setup. So I want to apply a 0.05% abundance filter to each sample in my feature table as a way of dealing with cross-contamination.
qiime feature-table filter-features
only provides filtering features by absolute frequency, but of course, total feature counts differ among my samples. If the feature table was rarefied to a specific number of features it would be straightforward to calculate 0.05% of this number and apply qiime feature-table filter-features --p-min-frequency
to the whole table. But I don't want to rarefy due to the drawbacks of rarefaction that have been discussed in this forum many times. So I decided that I will split the feature table into separate tables for each sample, export the biom file, make a biom summary, extract the total number of observations (3rd line of biom file summary), use that to calculate the 0.05% threshold for each sample, filter each table individually (qiime feature-table filter-features --p-min-frequency
) and then merge, all in a bash
script (currently the only scripting language I am somewhat proficient with).
The problem is that the number in biom file summary contains thousand separators (e.g. Total count: 127 524
) and I can't get bash
to recognize this as a number due to the gap between groups of digits. Is there any built-in method in QIIME2 or some hack that anyone knows to get the total feature count of samples programmatically and without this formatting? I know I can read it from feature table summary visualizations and type manually but that will not work when I have hundreds of samples.