ANCOM giving strange W values

adrian · August 14, 2017, 3:24pm

Hi,

I am using the Moving Pictures tutorial as a guidelines for analyzing my microbiome data. When trying to perform the ANCOM analysis at the end, I get strange results for ANCOM.

When I look at the results filtered to level 2, all W values are 0 (none significantly different). I don't necessarily expect anything to be different (the two groups have not differed in the alpha/beta analysis prior to this) but I don't think all the W values would be 0.

When I filtered at level 6, I get similarly strange results; only one genus is significant at W=34, and the other 58 genuses are not significant with their W values are 0 or 1.

I am a super beginner in all of this (microbiome analysis, statistics related to, and any sort of computer coding) so I am not sure if this is an error or if the W values I am seeing are "correct".

Thanks,
Adrian

thermokarst · August 14, 2017, 3:25pm

Hi @adrian, thanks for writing! I am going to ping @mortonjt on this one, he might have some thoughts on the matter. Thanks!

mortonjt · August 14, 2017, 4:11pm

I not exactly sure what is going on - would you like to attach your dataset?

There are many possibilities that could explain this, namely

Low resolution from level taxonomy summarization could obscure the signal, particularly if only one species is changing
Low counts / zeros could cause false positives in ANCOM
Issues with FDR in the statistical test
A really bad fluke with the statistical test. Remember, we are just conducting statistical tests, and every statistical test has some chance of failing (even it is small).

Having access to the underlying datasets to generate this result could help narrow down these issues.

adrian · August 14, 2017, 4:57pm

Thanks for replying! I attached the relevant .qzv files- is that informative or did you want another form/step of data?

Filtered for only female (results I detailed above):
l6-ancom-Treatment-female.qzv (35.4 KB)
l2-ancom-Treatment-female.qzv (32.1 KB)

Not to make this more complicated, but I also have ANCOM results from my full data set- above, I only looked at female, and below are my results from male+female, looking at the same variable "treatment" (morphine versus saline). The level 2 ANCOM is similarly weird (all W=0) and the level 5 ANCOM is also weird (4, 10, and the rest 0 and 1 for W but all seem to be significant?).
l2-ancom-Treatment.qzv (29.0 KB)
l5-ancom-Treatment.qzv (33.2 KB)

My dataset is also pretty small- 24 total, 12 female and 12 male, each of which have 6 of each treatment (morphine or saline).

mortonjt · August 14, 2017, 8:15pm

Thanks @adrian. Could you also send over the metadata and the tables? That way, we can sanity check to see how many zero / low count entries there are.

adrian · August 14, 2017, 9:05pm

Here you go, thank you for looking into this!

metadata.tsv (1.7 KB)
female-filtered-table.qza (91.0 KB)
femaletable.qzv (401.6 KB)
Full Set:
table.qza (134.3 KB)
table.qzv (429.2 KB)

thermokarst · August 17, 2017, 2:35pm

@mortonjt, we are also seeing this "lots-o-zeros" situation crop up on a dataset that we are currently analyzing. Feel free to ping me directly to coordinate, if you wish. Thanks!

mortonjt · August 18, 2017, 5:34pm

Ok, here's a summary of your data in table.qzv

In [1]: import qiime2
In [2]: table = qiime2.Artifact('table.qza').view(pd.DataFrame)
In [3]: (table>0).sum(axis=0).sort_values().value_counts()
Out[28]: 
1     1343
2      411
3      271
4      175
5      119
6       98
7       61
8       44
24      40
9       34
12      31
10      28
11      23
13      18
14      16
15      14
23      12
18      11
16      11
21      10
22       9
20       7
19       5
17       2

This is the sort scenario where ANCOM is expected to fail - the vast majority of your OTUs show up in very few samples. And since you need to add pseudocounts to replace the zeros, you are essentially adding a huge bias to your analysis.

Here's my suggestion.

Definitely filter out all of the OTUs that only appear in one sample
Filter out OTUs that met some count threshold - we definitely filter out OTUs less than 10 counts across all samples. If we don't have 10 reads for a single OTU, then it will provide very minimal information and is likely garbage.

system · September 19, 2017, 12:17am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.