I) Is it possible to collapse samples by a category of the mapfile using the new function taxa_barplot()? (i.e. control vs antibiotic treatment), then showing mean relative abundances instead per_sample abundances?
II) can we collapse “minor phyla” (say all phyla<1% total abundance), to avoid >10 colours in the same figure?
Otherwise, a good R code to get I) an II)? It is not so straightforward, apparently: https://github.com/joey711/phyloseq/issues/901
I guess these two functionalities are very demanded. Thanks!!
In answer to 1- I am not a huge fan of collapsing the samples into a single bar as it masks the error and can lead to misleading conclusions. If you want to do this, you can find the average of each feature abundance within a group and then make a barplot from this aggregated sample. 2- I usually limit the plot to the 10 most abundant features (SVs/OTUs/Genera/Phyla etc) and group everything else into a remainder. The taxa_barplot function will do this by default. You could make a custom version of this function at line 39 to instead of pulling out the 10 most abundant, pulling out everything over 1% in average abundance.
Hi Mr Bisanz,
Thank you for your great tutorial.
I am wondering if it is possible to diagnose a change step-by-step, or line-by-line, to our SVs data above. I can do it manually by declare SVs1 <- SVs instead of SVs above, and then delete pipe operator one at a time, but this is difficult. I realise that maybe there is a better way. Maybe debugging or something? Pardon me asking this basic R programming, because the context are already set I think it will be easier here.
I am using Rstudio.
I got no particular reason why choosing heatmap part for this example. It just happened that this part is what I am currently working on.
You could also just add head() or View() after the pipe on the line of interesting to send the result to the console or a new tab respectively.
If you want to customize the y axis, you slice and dice the text however you want. For example, you could paste together just the phylum and genus with something like mutate(Feature=paste(Phylum, Genus)). In my line above with the gsub command, I am just removing the extra k__Bacteria;p__Bacteroides… to get Bacteria;Bacteroides… to save on space when plotting.
Ah yes this is what I wanted all this time! Thank you! Also good tips, I didn’t know that we can run selected code. I ended up assigning new SVs upto SVs15 lol. Thank you!
Also, my heatmap did not show similar y label as yours. I think it is because of this part of the code?
mutate(Feature=paste(Feature.ID, Taxon)) %>%
mutate(Feature=gsub("[kpcofgs]__", “”, Feature)) %>% # trim out leading text from taxonomy string
ggplot(aes(x=SampleID, y=Feature, fill=NormAbundance)) +
The y from aes() asking for Feature column, but other than k__Bacteria;p__Bacteroides…etc line, there are also some random character of Feature.ID from mutate(paste()) function above it. That, or is it just me making mistake? What do you think, sir?
Also, what is that Feature.ID? They are from QIIME2 table.qza but, what algorithms do they use to generate the Feature.ID?
I believe the Feature.ID you would be seeing is the sequence’s hash which is a unique identifier for the SV (see here). You could just as easily replace them with arbitrary names like “SV_1” if you wanted. To do that, add an extra column to your taxonomy object with something like: