ANCOM vs Qurro: too many features lost in Ancom

arwqiime · October 2, 2020, 10:57am

Hi,
I analysed a 16S data sett that displayed a clear separation along axis 1 in RPCA biplot. The autoselection of top and bottom ranked features in qurro let me identify the most contributing data to this axis, which is in fact a seasonal axis of biofilm samples taken in May vs. August.
I then compared it to an ANCOM analysis (pseudocount added) and let ancom run on the same metadata column (Month, which has May and August in it).
I get a nice ancom volcano plot, and if I hover over the the most left and right features, I can manually identify many of the features that were also identified by qurro's autoselection.

But the list of Ancom statistical results is extremely short, with a smallest W values of 675. But there are many more features with 'high' W values, and a look at the percentile abundances table (inspection of the download table) indicates that there are much more features which are most likely significantly altered between these two conditions.
Also, the full table of the statistical results showed that many features with high W values were marked as rejected for the null hypothesis.

Is there an explanation why ancom does reject the null hypothesis for features with high W values, when an independent analysis (rpca/qurro) does point to much more feature with true differential abundances? (see qzv below)

Best regards,
ancom_Month-L7.qzv (1.1 MB)

jwdebelius · October 2, 2020, 2:18pm

Hi @arwqiime,

ANCOM 1 (as implemented in QIIME) makes a set of assumptions about the distribution of the data to do a bimodal feature-based selection. The authors of the pipeline have since suggested this bimodal approach may not be optimal, and have moved to setting a threshhold value (closer to a p-value). I happen to like 0.8 (80% of the ratios are significant), but I think the original authors were at 0.7. You an check out the ANCOM II repo and paper here for more in formation.

However… ANCOM is a conservative test. It may not pick up all the factors that contribute to the PCA in DECOIDE. You hope they’re correlated (and it sounds like they are). But, I feel like it’s also harder to make a direct correlation. Others may have other insight, @mortonjt or @cmartino are kind of the experts!

Best,
Justine

mortonjt · October 2, 2020, 2:48pm

I’m going to echo @jwdebelius’s comment – ANCOM2 is a completely different pipeline; it does not compute intermediate p-values and directly estimates fold changes. So there is a good chance that the problems you are currently running into have already been addressed in ANCOM2.

arwqiime · October 2, 2020, 3:16pm

Hi @jwdebelius and @mortonjt,
Is it correct that the ancom version in q2-2020.8 is ANCOM v1?
I saw an comment by @thermokarst at github about a year ago, that v1 was in q2 at that time.
There are two other statements from colleagues in Italy (found google) stating that the lates q2 release has ancom2. I don’t want to paste these URLs in case that they are not right (actually, these URLs are not loading, and I want t avoiding the spread of not correct statements).

Best regards,

mortonjt · October 2, 2020, 3:27pm

No. That is only available through R via https://github.com/FrederickHuangLin/ANCOMBC

We are currently trying to put together a qiime2 plugin for it.

system · November 2, 2020, 9:27pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.