I need some help to understand my barplot
I have 2 groups, treatment (n=10) and control (n=10), individual barplots looks good.
When I group my subjects by feature (i.e. intervention) and then plot the results as bar,
then I see quite a different view.
It took me some time to realize that samples with 0 reads are not included in the averaging and thus the representation of taxa abundance in the group is shifted.
What Im used to is that one sums up and divides by the n in the group: should be 10 in my case, but it is 7, as 3 out 10 does not have that particular taxa.
Is that correct? Was it always like that, but I’ve never realized.
I don't think that's the case — this may be more of an issue with the grouping mode that you are using. Instead of using the mean or median, maybe sum your counts?
But maybe I misunderstand — would you mind posting the commands that you are using and maybe also the QZVs before/after grouping? Thanks!
Thanks! I’ve ment to join directly after the workshop in Copenhagen, but you know…
Anyhow, here are the commands Im using:
qiime feature-table group
–i-table skin_biopsy.qza
–p-axis sample --m-metadata-file metadata.tsv
–m-metadata-column ‘Group_ID’
–p-mode ‘sum’ (I’ve actually used all possible modes and got the same output visually)
–o-grouped-table skin_biopsy_grouped.qza
I do understand that you guys are very busy, but Im really stuck...
Here are the visualization before and after grouping, I've chosen some extremes to make it more clear
What Im most concern is the last 10 samples that have a 'purple' taxa, quite abundant in 7 of 10 samples with average about 25% After grouping it gives 4%!
What do I do wrong?
Looking forward for the answer, ideas and tip!
thanks
Kotryna
Are you using sum mode or median or mean mode? Summing could yield the result you describe if the other 3/10 samples have more sequences than the others.
One easy way to check: rarefy your samples before making the barplot
I’ve used all mode available sum, median and mean and all of those give me more or less the same outcome around 4%, which I find extremely weird…
When I look at the rarefying curves, I’d not say that those 3 samples have much more sequences; all those 10 samples that I have make a nice ‘gradient’
But let’s say those 3 outweighs the others and the result is true for sum mode , but what about median and mean?
Could metadata be a problem? SILVA?
Could the problem be that the taxonomy of that particular OTU does no go deeper than K_Bacteria;;;_?
Certainly could be, but you'd probably notice any obvious issues by now.
Now that I think about it, sum, mean, and median would all most likely be impacted by uneven sampling. This is the most likely cause of the issue you are seeing.
don't go off the rarefaction curve — look at the total reads per sample.
But even sampling (with the qiime feature-table rarefy action) would be one way to correct this for the purposes of making a barplot. This issue is pretty similar to this topic:
If you want to positively confirm, I'd say export the data and sum those features by hand (first collapse by taxonomy to replicate what barplot is doing)