Barplots grouped by feature, how does the math work?


I need some help to understand my barplot :slight_smile:
I have 2 groups, treatment (n=10) and control (n=10), individual barplots looks good.
When I group my subjects by feature (i.e. intervention) and then plot the results as bar,
then I see quite a different view.
It took me some time to realize that samples with 0 reads are not included in the averaging and thus the representation of taxa abundance in the group is shifted.
What Im used to is that one sums up and divides by the n in the group: should be 10 in my case, but it is 7, as 3 out 10 does not have that particular taxa.
Is that correct? Was it always like that, but I’ve never realized.

Thank you very much for the response!

1 Like

Welcome to the forum @Kotryna!

I don’t think that’s the case — this may be more of an issue with the grouping mode that you are using. Instead of using the mean or median, maybe sum your counts?

But maybe I misunderstand — would you mind posting the commands that you are using and maybe also the QZVs before/after grouping? Thanks!

1 Like

Thanks! I’ve ment to join directly after the workshop in Copenhagen, but you know… :slight_smile:

Anyhow, here are the commands Im using:
qiime feature-table group
–i-table skin_biopsy.qza
–p-axis sample --m-metadata-file metadata.tsv
–m-metadata-column ‘Group_ID’
–p-mode ‘sum’ (I’ve actually used all possible modes and got the same output visually)
–o-grouped-table skin_biopsy_grouped.qza

qiime taxa barplot
–i-table skin_biopsy_grouped.qza
–i-taxonomy skin-taxonomy.qza
–m-metadata-file metadata.tsv
–o-visualization skin_biopsy_grouped.qzv


I’d be really happy if you can help me with that. Have no idea what Im doing wrong.

Thank you!

Hi again!

I do understand that you guys are very busy, but Im really stuck...
Here are the visualization before and after grouping, I've chosen some extremes to make it more clear

What Im most concern is the last 10 samples that have a 'purple' taxa, quite abundant in 7 of 10 samples with average about 25% After grouping it gives 4%!
What do I do wrong?

Looking forward for the answer, ideas and tip!

Hi @Kotryna,
Sorry, back from a holiday weekend.

Are you using sum mode or median or mean mode? Summing could yield the result you describe if the other 3/10 samples have more sequences than the others.

One easy way to check: rarefy your samples before making the barplot

Could you give that a try and let us know?

1 Like

Hi @Nicholas_Bokulich,
hope you have had nice holiday weekend!

I’ve used all mode available sum, median and mean and all of those give me more or less the same outcome around 4%, which I find extremely weird…

When I look at the rarefying curves, I’d not say that those 3 samples have much more sequences; all those 10 samples that I have make a nice ‘gradient’ :slight_smile:
But let’s say those 3 outweighs the others and the result is true for sum mode , but what about median and mean?

Could metadata be a problem? SILVA?
Could the problem be that the taxonomy of that particular OTU does no go deeper than K_Bacteria;;;_?

Eternally grateful!

Certainly could be, but you’d probably notice any obvious issues by now.

Now that I think about it, sum, mean, and median would all most likely be impacted by uneven sampling. This is the most likely cause of the issue you are seeing.

don’t go off the rarefaction curve — look at the total reads per sample.

But even sampling (with the qiime feature-table rarefy action) would be one way to correct this for the purposes of making a barplot. This issue is pretty similar to this topic:

If you want to positively confirm, I’d say export the data and sum those features by hand (first collapse by taxonomy to replicate what barplot is doing)

I hope that helps!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.