Making graphs with qiime2R

Good morning @Anahita_Bharadwaj,

I made a new thread for your posts!

Now on to the questions.

ps %>%
  tax_glom("Phylum") %>%
  plot_bar()
# You can save the phylum level object for later use
ps.phylum <- ps %>%
  tax_glom("Phylum")
ps.phylum %>% plot_bar()

Like, graph taxa just within one phyla?


Yes!

ps %>%
  tax_glom("Phylum") %>%
  merge_samples("ExperimentalCondition", fun=mean) %>%
  plot_heatmap()

Instead, I would suggest keeping your samples separate, but then grouping them using a facet.

ps %>%
  tax_glom("Phylum") %>%
  plot_heatmap() + facet_grid(~ExperimentalCondition)

So this will visually group your samples, while still showing within group variation!

Colin

P.S. You probably found these already, but here are some great examples of phyloseq graphs.

1 Like

Thanks @colinbrislawn.

Yes, that's right! Like the top 10 Firmicutes, and so on...

I will try these out. In the example above, I don't think phyloseq library was used to draw the heatmaps (I could be wrong) and also, the taxonomy.qza file is in combined form (see picture). I guess I will need to use something like
parse_taxonomy(taxonomy, tax_sep, trim_extra)
to first separate the taxa and then use the code you recommended?

But if I did this, it would impact the Y-axis of the original heatmap I've already developed using the tutorial. I don't know if I am explaining it correctly... basically the way that the data needs to be arranged for phyloseq library is different from what's on this tutorial (or maybe I am wrong?). And so, I am trying to figure out if I can use one or the other to create these additional heatmaps. I would really appreciate your advice on this! Thanks!

Screen Shot 2020-04-28 at 12.20.06 PM

OK! Here's what I came up with based on these examples

ps.just_firmicutes.family <- ps %>%
  subset_taxa(Phylum == "Firmicutes") %>%
  tax_glom("Family")

ps.just_firmicutes.family.topnames <- names(sort(taxa_sums(ps.just_firmicutes.family), TRUE)[1:10])
ps.just_firmicutes.family.top <- prune_species(ps.phylum.top10.names, ps.just_firmicutes.family)

ps.just_firmicutes.family.top %>%
  plot_bar() + labs(title = "Top 10 Firmicutes at the Family level")

There are lots of different ways to plot heatmaps, and none of them are perfect. You are correct: you will need to reorganize your data to make it match the input for the program.

Because you have your data in a phyloseq object, it would be good to at least try the plot_heatmap() function because you don't have to reformat your data at all!
https://joey711.github.io/phyloseq/plot_heatmap-examples.html

Colin

1 Like

Thank you! This is really helpful.

That's true, I really like the tutorial as it was a very intuitive way for me to understand how the heatmaps were drawn (going through it one step at a time). I will try to figure out how to tweak it for my requirements, but if someone else gets to it before I do, I would love to see it! (Enthusiastic learner :sweat_smile: :grinning:)

I understand. I don't actually have it as a phyloseq object but I will check it out and try to learn it. Thanks again!

1 Like

Oh OK!

All of my examples start with data in a phyloseq object, which I called ps. Once you do have your data in a phyloseq object, all of the plot_heatmap() plot_ord() plot_alpha() function that come with phyloseq are quite easy to use.

Colin

2 Likes

Thank you! I think I am getting the hang of it. I have just one more question... I see that the abundances in the charts are plotted as is. Is there any way that I can go in and change the abundances to relative abundance (on a percentage basis) and convert it into a log plot similar to what's in the tutorial?

1 Like

Good afternoon,

Sure!

ps.just_firmicutes.family.fraction <- ps.just_firmicutes.family %>%
  transform_sample_counts(function(x) x/sum(x))

# or

ps.just_firmicutes.family.percent <- ps.just_firmicutes.family %>%
  transform_sample_counts(function(x) 100*x/sum(x))

There is several different ways to make a plot have a log scale. My favorite is
ggplot(... ) + scale_y_log10()
but that might not work well for stacked bar plots…

What does your current plot look like? Maybe once I see it, I can make a suggestion!

1 Like

Oh awesome, thanks so much!
So, I made the heatmap with the following,

physeq %>%
tax_glom("Phylum") %>%
plot_heatmap(taxa.label = "Phylum", sample.label = "Description2", sample.order = "Type", high = "#00FFFF") +
facet_grid(~Description1, scales = "free")+
theme_q2r() +
theme(
axis.text.x=element_text(size = 6, colour = "black", angle=90, hjust=1),
axis.text.y=element_text(size = 6, colour = "black", hjust=1),
strip.text.x = element_text(size = 8, colour = "black"),
panel.spacing= unit (0, "cm")
)

and it looks something like this (I blocked some of the sensitive info),

I am looking to do two things (circled in red),
(1) Report the data is %abundance (and on a log scale... like in the tutorial)
(2) Figure out how to change the label from being just phylum to Kingdom_Phylum. Someone in another forum recommended this

newrank <- paste(physeq@tax_table [ ,"Kingdom"], physeq@tax_table [ ,"Phylum"], sep = "_")
physeq@[email protected] <- cbind(physeq@[email protected], newrank)

and then drawing the heatmap with the newrank taxa level, but that didn't work for some reason since it wasn't combining/collapsing the abundances within each Kingdom_Phylum.

This is pretty easy!

physeq %>%
  tax_glom("Phylum") %>%
  transform_sample_counts(function(x) 100*x/sum(x)) %>% # counts into %
  plot_heatmap(
    taxa.label = "Phylum",
    sample.label = "Description2",
    sample.order = "Type", high = "#00FFFF") +
  facet_grid(~Description1, scales = "free")+
  theme_q2r() +
  theme(
    axis.text.x=element_text(size = 6, colour = "black", angle=90, hjust=1),
    axis.text.y=element_text(size = 6, colour = "black", hjust=1),
    strip.text.x = element_text(size = 8, colour = "black"),
    panel.spacing= unit (0, "cm")
  )

I think that plot_heatmap() in phyloseq is log by default... but let me know what scale you see when running that command.


This is more difficult, as you can see by that complicated command!

Those two lines will make a new taxonomy level called newrank so you will have to update the script to use this new name

physeq %>%
  tax_glom("newrank") %>%
  plot_heatmap(taxa.label = "newrank", ...

You have made a ton of progress over these last two days. Enthusiastic learner indeed! :clap: :100:

Colin

1 Like

haha, I am a week in and loving it, thanks to this forum and all the help. I really do appreciate it! :grinning:

I changed the code to this but it looks like it doesn't sum up all the abundances within each phyla of the newrank. Instead, it is plotting them all separately.
I think I need to try something like

summarize(Abundance=sum(Abundance))

Would that work in this case?

I feel that way, everyday. We all have a lot to learn. :tea:


Would you be willing to post your full code and your output so I can take a look?

ya sure, it's exactly the same as what you had posted...

newrank <- paste(physeq@tax_table [ ,"Kingdom"], physeq@tax_table [ ,"Phylum"], sep = "_")

physeq@[email protected] <- cbind(physeq@[email protected], newrank)

physeq %>%
tax_glom("newrank") %>%
plot_heatmap(taxa.label = "newrank", sample.label = "Description2", sample.order = "Type", high = "#00FFFF") +
facet_grid(~facetorder, scales = "free")+ #I created a manual facet order for plotting
theme_q2r() +
theme(
axis.text.x=element_text(size = 6, colour = "black", angle=90, hjust=1),
axis.text.y=element_text(size = 6, colour = "black", hjust=1),
strip.text.x = element_text(size = 8, colour = "black"),
panel.spacing= unit (0, "cm")
)

But when I do that, it isn't collapsing the phyla. I am not sure why... I think I need something additional code to ask it to combine the phyla together.

Apart from this, I am looking to sort the data from largest to smallest abundance and plot only the top 10 abundant phyla.

I’m not 100% sure why the merge is not working. :thinking:

When you run this command:

physeq %>% tax_glom("newrank")

How many tax levels are in the tax table? (It should be 8 now that we have added newrank)
Is the total number of feature smaller than those in the physeq object?

Yes, there are 8 taxa after merging.

I am not quite sure I understand this question... but nothing had changed except the addition of the 8th taxa and the taxonomy within that it.

OK! So it looks like the rank was added, but the tax_glom() function did not work… Does tax_glom("Phylum") still work?

Yes, it does. I was thinking that perhaps some of the phyloseq functions are designed to recognize only specific elements, in this case, 7 taxa? I am guessing this because it also didn’t let me play around with the color scheme too much (I was hoping to use scale_fill_viridis_c() but I couldn’t disable the preexisting fill parameters). It’s possible I went about it wrongly though.

Strange! In my own work with Phyloseq it’s been very flexible with both data structures and plot color schemes.

Another option is to use psmelt() first to turn your phyloseq object in a traditional data frame, then modify and graph using tidyverse and ggplot2. We can explore this option more, because we know psmelt() and the tidyverse packages seem to be working.

Want to try that?

ahh okay… I can try that! It’s probably likely I am doing something wrong as I am only getting started :sweat_smile:

I did something similar to what you have suggested :smiley: but by downloading the .tsv file directly off of the relative abundance bar plots (the .qzv file) from QIIME2 . Bit of a cheat but it gives me the data in the arrangement I need. As you recommended, I then used tidyverse and ggplot2 from the tutorial. I appreciate all your advice and help with this!

1 Like

2 posts were split to a new topic: how to get a phyloseq object with qiime2r?