Hi,
I have a question with regards to filtering the feature tables. According to QIIME1 you could specify the percentage you wanted to filter (i.e. to filter at 1% specify 0.01) as follows : filter_otus_from_otu_table.py - --min_count_fraction Fraction of the total observation (sequence) count to apply as the minimum total observation count of an otu for that otu to be retained.
@Nicholas_Bokulich addressed low abundance filtering in 2013, recommending: "For datasets where a mock community is not included for calibration, we recommend the conservative threshold of (c = 0.005%). "
c=otu abundance threshold.
I just read a recent thread where @Nicholas_Bokulich states that "the feature abundance filtering protocols recommended in the 2013 Nature Methods paper are not tested in conjunction with dada2 and are most likely unnecessary".
I am now using QIIME2 and after running DADA2 on my samples my feature table contains 3,159 features (total frequency 10,482,132). I have conducted contingency-filtering to filter out singletons after running DADA2 and end up with a total of 1,250 features (total frequency 10,304,300). I have read several papers and some people filter further at 0.005% or even at 0.1% (end up with feature tables containing 100-500 features).
If I filter at 0.005% my feature table contains 536 features (total frequency 10,252,159) and at 0.1% 101 features (total frequency 9,265,493).
What is considered to be acceptable? I would be grateful if someone could recommend an approach to use.
Thank you!