Abundances-based Filtration Approach

Dear Qiime Peers,

I need your kind advice on the filtration based on features abundances. As I come across with few papers, they filtered the features abundances at 0.01 or 0.5% to focus on dominant taxa. Let’s say my total feature abundances are 200,000, does it mean the --p-min-frequency at 20 for 0.01% threshold and --p-min-frequency at 1000 for 0.5% threshold?
Does it imply this filtration fulfill the meaning of this statement: remove features that comprise less than 0.01% or 0.5% average abundance across samples?

I found that q2 feature-table filter-features command was used to filter features summed across all samples. Is there any possibility to perform filtration for each sample, i.e remove features that aren’t detected at min 0.5% in at least one sample? I know qiime 2 can’t execute filtration based on relative abundances nor percentile, but we can self-calculate and transform it into absolute abundance which is compatible to qiime 2 requirements.

Thank you in advance.


1 Like

Hi @Benedict,
This type of filtering should not be necessary if you are using dada2 or deblur. It is recommended for OTU clustering methods, since the rare OTUs are often noise. But 0.5% is a really high threshold.


Correct, and this is the type of filtering that is used most often in the literature for filtering OTUs.

No, there is no per-sample filtering in QIIME 2. This is not a good way for removing noise, since 0.5% could represent very different #s of sequences depending on the sample.

1 Like

Yes, I’m using DADA2 denoise to filter and merge my demultiplexed reads. Notwithstanding, there were extremely high diversity of taxonomic composition (genus and species lvl) after the taxonomy annotation step, yet only few taxa dominate the host and the rest are sparsely associated with host (e.g:1% or lesser). I would like to focus my interpretation on dominant taxa group, so would you advice to filter feature table using abundance-based filtration approach? Apart from that, since my sample type is highly diverse, does filtration based on features summed across all samples reliable? Will some significant features being removed out from one sample? For instance, features A only present in 1 sample out of 20 samples(not replicates), when it is summed up across all sample, it might represent a low total frequency and “q2 feature-table filter-features” might filter out that feature, am I correct?

I would not personally but that’s your call.

I really cannot answer these questions… they are highly specific to the data. All I can say is give it a try and see how it impacts your results.

No, the frequency is going to be the same regardless. So unless if you are increasing the minimum frequency because you have more samples, that feature would remain. 0.5% is a very high threshold and will probably severely impact your results.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.