I want to know it is necessary to fillter data（eg. filter low abundance feature or taxa) before analysis? What would have happened if I hadn’t filtered out the low abundance sequence? I can I still use this result ?
Another puzzle about ANCOM is :
How should I describe Ancom?
I have seen such a description
“ANCOM calculates pairwise log-ratios between combinations of taxa and considers how many times (W) the null hypothesis (no differenxe between each pairwise comparisons of taxa) is violated”
But I know the W is the times rejecet the sub- hypothesis,and the sub- hypothesis is “no differenxe between each pairwise comparisons of taxa”, and null hypothesis is W = some cutoff threshold? Im not sure. How can I describe the difference statistics of ANCOM in one sentence? I have not found a satisfactory description in the article so far. Is there a friend who can provide a reference?
It depends. This is a decision you will need to make based on your familiarity with your data. I will quote the Parkinson’s Mice tutorial:
Filtering can provide better resolution and limit false discovery rate (FDR) penalty on features that are too far below the noise threshhold to be applicable to a statistical test. A feature that shows up with 10 counts could be a real feature that is present only in that sample; a feature that’s present in several samples but only got amplified and sequenced in one sample because PCR is a somewhat stochastic process; or it may be noise. It’s not possible to tell, so feature-based analysis may be better after filtering low abundance features. However, filtering also shifts the composition of a sample, further disrupting the relationship. Here, the filtering is performed as a trade off between the model, computational efficiency,
This is a case where a decision was made about filtering after considering the pros and cons.
ANCOM assumes that few (less than about 25%) of the features are changing between groups. If you expect that more features are changing between your groups, you should not use ANCOM as it will be more error-prone (an increase in both Type I and Type II errors is possible). Because we expect a lot of features to change in abundance across body sites, in this tutorial we’ll filter our full feature table to only contain gut samples.
In this case, the decision to filter based on body-site is perhaps a bit more obvious/necessary.
The w score indicates the number of comparisons that are deemed significant. In this case, the w score indicates how many times a feature was found to be differentially abundant. Differential abundance is implicated for a feature when there is significant variance in the pairwise comparisons of log-ratios with other features.