Help with statistical testing on most abundant features

Hi All!

I am a research student and am really new to the world of bioinformatics. I am working with 40 ovine fecal samples which are suspected of scouring. In this I have about 6 variables (collection time, evidence of scouring, stool consistency, sex etc). I have successfully followed the “Moving Pictures” tutorial - and have demultiplexed, ’ deblurred’ and then run the various commands. I need help now with interpreting the outputs.

Specifically, for two parts. I am not sure how to visualise the “Most Abundant Features/AVS” - I have used qiime taxa collapse down to level 2, and have 24 features in the collapsed-table-summary.qzv. I am unsure of now how to visualise this into a graph/bar plot? Would interpreting this into a simple bar plot be suffice?

My goal is to get some form of visualisation of the most abudant features, so I can interpret this further in my thesis. I know previously in qiime1 there was an output for “Most Abundant OTU’s” - and have seen Spearman’s Correlation analysis been used to interpret. I know this is not possible with Qiime2. This brings into my next question:

ANCOM, I have done a quick crash-course on ANCOM and have managed to run a few commands to ANCOM analysis on the variables I listed above, which I have also collapsed into Level 4 qzv. files. I am having a really hard time understanding how this contributes to my analysis (I understand it is indicative of composition of samples) but I am not sure how this can help further with relative abundance of taxa, as I can only get output per variable, not on all 40 samples.

Sorry if this doesn’t make sense, I am still learning the jargon!

1 Like

Hi @Sarah_Ratcliffe,


That would definitely be my approach to the most abundant features; either individually or overall. (I tend to go for overall or an average, but that’s a personal preference.) You could easily do this by exporting to Excel, R, python, or something similar).

That’s because ANCOM finds the differentially abundant taxa, not the most abundant, and compares them across communities. Its looking for componalities, so it scores each of the features (ASV, order, whatever) based on the relationship with those commonalities rather then on individual samples overall. Think about it like a t-test or something similar.

I have a bigger philosophical question for you, though, which is what says your most abundant taxa are the most important one? Do the organisms changing the community structure have to be super abundant (more than 1% of average relative abundance, as an example) or, can they be rarer, where perhaps the presence of a small amount of a theoretical organism (0.01%) is enough to represent a change? (My favorite macroscale example of this is the Grey Wolf in Yellowstone where a pack of wolves was able to change the flow of a river bed and alter a flood plain. But, its definitely not the most abundant organism in the system).

Given your sample size, do you need to identify specific organisms in and of themselves or is it sufficient to say that there’s a change in the overall community structure? If you want to say that, do you want that change to be in the most abundant organisms, or do you want to look across other things?

Does it matter that by collapsing your data (which TBH I like for visualization because most people can only process about 8-12 colors) to level 2 (which I think it phylum?), you’re essentially taking everything with a spinal cord (:bird:,:dog2:,:frog:,:shark:) and comparing it to things with exterior shells and joined legs (arthropods, :bug:, :lobster:)? (Or, if Im mistaken and its class level, the difference between amphibians :frog: and mammals :koala:). No right answer here, but something to consider?



Hi Justine!

Apologies on the late reply and thankyou for your help!

I have had a better look now at what I am trying to communicate with my analysis, I think based on my sample size, I will be trying to see if there is any overall change in community structure and can’t really consider rare ASV’s as I only have 1/4 of my desired sample set.

However there are some genus that are believed to be correlated with ovine scouring which I am looking into. Hence why I wanted to see the most abundant taxa, to see if it confirms what is said in the literature.

I have collapsed it down to phyllum, as it is easier for my markers to understand and partially because alot of the ASV’s could not be resolved past their Family or Order level. I have used R now to help me create a better graph at this level (with the help of my supervisor)

All the best,

1 Like