filter features conditionally

Hi
I recently discovered the following command:
qiime feature-table filter-features-conditionally
--i-table table.qza
--p-abundance 0.01
--p-prevalence 0.50
--o-filtered-table table-prevalent.qza \

I wanted to know if it is possible to remove features present in less than 50 % of dieased(n=30) and control population(n=30) specifically. i.e remove features not present in 50 % of diseased population.
Also is this filtering necessary or just removing features which are not present in 50% sample as a whole enough(n=60)?

Thank you

Hello!
I would go for the whole dataset, but probably 50% is too strict since individual variation is very high.
I would not use separate filtration if you want to compare diseased vs control. Let's imagine that some feature is represented in 49 % of samples in one group and in 51% in another. By filtering like this you would introduce artificial bias to the dataset.

3 Likes

Hello!!
Thank you for insights. I did excute the above command, and i was left with only 13 features out of 6000 initially.from literature where few papers did mention their filtering criteria, I figured the relative abundance value for above command.
I was wondering if the filtering of features important just for finding differential abundance testing or it will also affect my alpha and beta diversity indices!

As I already wrote, 50% is too strict.
For DA test, I usually filter feature table like this:

qiime feature-table filter-features-conditionally \
  --i-table table.qza \
  --p-abundance 0.01 \ #sometimes 0.005
  --p-prevalence 0.1 \ #sometomes 0.2
  --o-filtered-table table-prevalent.qza

One reason I filter features for the DA test is to avoid situations where I can see differentially abundant features with very low relative abundances. Yes, they are differentially abundant, but whether there is a biological meaning to them is another question. Can I proof them with RT PCR after it?

And definitely it may affect diversity metrics. For diversity metrics, I prefer to filter out features that were counted less than 10 times and found in less than 2-3 samples.

Best,

2 Likes