Can I please get some help and guidance as to how to decide on the feature filtering criteria in qiime2?
qiime feature-table filter-features
--i-table table-dada2.qza
--p-min-frequency ???
--p-min-samples ???
--o-filtered-table table-filt-min-sample.qza
For the above step, how does one decide the value of --p-min-freq? Which table one should look at to search for the --p-min-freq value - is it “Frequency per sample” or “Frequency per feature” tables from the table.qza output of DADA2? Also, is it mendatory to perform this step (or I can safely skip such filterings given it does not compromise the downstream analysis?
You don't have to perform this step, so this is a very good question to ask!
I think the goal is to remove rare features, and this would motivate the choice of --p-min-freq.
Well, in the Gut-to-soil tutorial, they run this before the filter to get feature counts:
Hi Colin @colinbrislawn, thank you very much for your response. I understand that rare ASVs often represent noise, which is why filtering is necessary. However, I am still unsure about how to choose an appropriate filtering threshold. I looked at the gut–soil tutorial, kindly tagged by you, but it does not specifically tells about the threshold to be used. The summary tables show the minimum, Q1, median, Q3, mean, and maximum frequencies for the features. Are any of these values actually useful for selecting the feature-filtering threshold? Since QIIME2 provides filtering option, it would be helpful to have clearer guidelines on how to perform this step, especially regarding how to choose appropriate filtering thresholds.
It looks like the pd-mice tutorial uses this command as a pre-filter before Differential Abundance testing. Here's the best info I could find:
Zooming out a little bit, Qiime2 provides good default settings, so if there was a single 'right' way to do this, we would tell you.
This is a tradeoff between keeping rare features and removing noise, and that depends on the sources of noise for your sample and the biological signal you are trying to identify.
Also, I have found that reviewer three will not like my choice of --p-min-frequency, so later I'll have to change it anyway.
If I can jump in @colinbrislawn, the good news from the answer you linked is that I got joint filtering in QIIME 2!
In terms of p-min-frequncy; I tie my minimum sample frequency to my rarefaction depth (if Im rarefying to 1000 seqs/sample, I want sampels with at least 1000 sequences). Some one will absloutely be mad about your rarefaction threshhold (mine is reviewer 3 as well!).
In terms of feature discards, it depends. I tend to filter my features more heavily on other criteria. For example, features I cant classify to at least phylum level, reads that don’t insert into a phylogenetic tree, or reads that are assigned to chloroplasts or mitochondria. I might also filter super rare features (singletons). This is the table I then take into subsequent steps.
So, I’d rarefy it as a normalization approach for classic diveristy metrics. I’d filter it again a little more aggressively for differential abundance/prevelance, aitchison distance, or DEICODE.
Not sure if that helps or makes this more confusing.