Making graphs with qiime2R

Anahita_Bharadwaj · April 24, 2020, 11:27pm

Hello there,

First of all, thank you so much for this tutorial @jbisanz. It’s so so helpful.

So, I am trying to generate some PCoA plots for my data and have adapted the given code as follows,

uwunifrac$data$Vectors %>%
  select(SampleID, PC1, PC2) %>%
  left_join(metadata) %>%
  left_join(shannon) %>%
  ggplot(aes(x=PC1, y=PC2, color=`Type`, shape=`Day`, size=shannon)) +
  geom_point(alpha=0.5) + 
  theme_q2r() +
  scale_size_continuous(name="Shannon Diversity") +
scale_color_viridis_d(name="Milled Conditions")

So, essentially very similar, except I changed the axes. However, I am getting the following warnings and about half my data isn’t getting plotted,

Warning messages:
1: Column `SampleID` joining factors with different levels, coercing to character vector 
2: Removed 48 rows containing missing values
(geom_point).

I checked to confirm that the order of my SampleIDs match up in both the metadata and the Unweighted UniFrac PCoA files. When I visualize them on QIIME2, they look fine to me. I was hoping you could help me figure this out. Thanks for your help!

colinbrislawn · April 26, 2020, 6:59pm

Hello Anahita,

Errors show that something has gone wrong and needs to be fixed, but warning are just friendly notes letting you know how things are going. I often get Warnings like this, and this is OK!

If your R output graphs look ok, you should be good to go.

Here is what these two warnings mean:
1: Column SampleID joining factors with different levels, coercing to character vector
Now you have a ‘character vector!’

2: Removed 48 rows containing missing values
Rows with missing values are removed because you can’t graph them anyway. As long as you see the expected number of samples in the plot, this is fine.

Colin

Anahita_Bharadwaj · April 27, 2020, 4:07pm

Hello,

Thanks for your response! I don't have any NA values in my data and that's the reason this is confusing. I checked what's happening with the combined file and this is how it looks. It seems like between the two files, although the #SampleID seems to be the same name, it isn't matching up. Is there something I am missing?

EDIT: Oh gosh! I figured it out. Some of the metadata sample names had an invisible "space" after the names, so R was picking them up as different samples. I had to manually go through and remove all the spaces. Gotta be careful while making that metadata file... eek! Thanks for your input through!

Anahita_Bharadwaj · April 28, 2020, 1:13am

Hi again,

I have been using this tutorial to draw heatmaps for my data. It works well, thanks so much @jbisanz !

Now, I am trying to figure out few small things,
(1) Is there a way I can collapse the taxonomy data to the phylum level?
(2) Similarly, if there is a way to plot, abundances within specific phyla separately? (I tried to sort it in alphabetical order instead of mean abundance but couldn’t figure out the secondary sorting within that).
(3) I have 12 replicate data from each experimental condition. Would it be possible to collapse across these replicates and draw the heatmap for the ‘average’ relative abundance across the different phyla? (the bigger question also being if this is an advisable/correct thing to do?)

Also, can you please explain what the line summarize(Abundance=sum(Abundance))
does?

I have been trying to do this on my own but am fairly new to R. So, I would really appreciate your help with this or if you could point be to other sources as I couldn’t find any. Thank you for your help!

colinbrislawn · April 28, 2020, 2:45pm

Good morning @Anahita_Bharadwaj,

I made a new thread for your posts!

Now on to the questions.

ps %>%
  tax_glom("Phylum") %>%
  plot_bar()
# You can save the phylum level object for later use
ps.phylum <- ps %>%
  tax_glom("Phylum")
ps.phylum %>% plot_bar()

Like, graph taxa just within one phyla?

Yes!

ps %>%
  tax_glom("Phylum") %>%
  merge_samples("ExperimentalCondition", fun=mean) %>%
  plot_heatmap()

Instead, I would suggest keeping your samples separate, but then grouping them using a facet.

ps %>%
  tax_glom("Phylum") %>%
  plot_heatmap() + facet_grid(~ExperimentalCondition)

So this will visually group your samples, while still showing within group variation!

Colin

P.S. You probably found these already, but here are some great examples of phyloseq graphs.

Anahita_Bharadwaj · April 28, 2020, 4:45pm

Thanks @colinbrislawn.

Yes, that's right! Like the top 10 Firmicutes, and so on...

I will try these out. In the example above, I don't think phyloseq library was used to draw the heatmaps (I could be wrong) and also, the taxonomy.qza file is in combined form (see picture). I guess I will need to use something like
parse_taxonomy(taxonomy, tax_sep, trim_extra)
to first separate the taxa and then use the code you recommended?

But if I did this, it would impact the Y-axis of the original heatmap I've already developed using the tutorial. I don't know if I am explaining it correctly... basically the way that the data needs to be arranged for phyloseq library is different from what's on this tutorial (or maybe I am wrong?). And so, I am trying to figure out if I can use one or the other to create these additional heatmaps. I would really appreciate your advice on this! Thanks!

Screen Shot 2020-04-28 at 12.20.06 PM

colinbrislawn · April 28, 2020, 5:14pm

OK! Here's what I came up with based on these examples

ps.just_firmicutes.family <- ps %>%
  subset_taxa(Phylum == "Firmicutes") %>%
  tax_glom("Family")

ps.just_firmicutes.family.topnames <- names(sort(taxa_sums(ps.just_firmicutes.family), TRUE)[1:10])
ps.just_firmicutes.family.top <- prune_species(ps.phylum.top10.names, ps.just_firmicutes.family)

ps.just_firmicutes.family.top %>%
  plot_bar() + labs(title = "Top 10 Firmicutes at the Family level")

There are lots of different ways to plot heatmaps, and none of them are perfect. You are correct: you will need to reorganize your data to make it match the input for the program.

Because you have your data in a phyloseq object, it would be good to at least try the plot_heatmap() function because you don't have to reformat your data at all!
https://joey711.github.io/phyloseq/plot_heatmap-examples.html

Colin

Anahita_Bharadwaj · April 28, 2020, 5:46pm

Thank you! This is really helpful.

That's true, I really like the tutorial as it was a very intuitive way for me to understand how the heatmaps were drawn (going through it one step at a time). I will try to figure out how to tweak it for my requirements, but if someone else gets to it before I do, I would love to see it! (Enthusiastic learner )

I understand. I don't actually have it as a phyloseq object but I will check it out and try to learn it. Thanks again!

colinbrislawn · April 28, 2020, 5:55pm

Oh OK!

All of my examples start with data in a phyloseq object, which I called ps. Once you do have your data in a phyloseq object, all of the plot_heatmap() plot_ord() plot_alpha() function that come with phyloseq are quite easy to use.

Colin

Anahita_Bharadwaj · April 30, 2020, 6:50pm

Thank you! I think I am getting the hang of it. I have just one more question... I see that the abundances in the charts are plotted as is. Is there any way that I can go in and change the abundances to relative abundance (on a percentage basis) and convert it into a log plot similar to what's in the tutorial?

colinbrislawn · April 30, 2020, 7:06pm

Good afternoon,

Sure!

ps.just_firmicutes.family.fraction <- ps.just_firmicutes.family %>%
  transform_sample_counts(function(x) x/sum(x))

# or

ps.just_firmicutes.family.percent <- ps.just_firmicutes.family %>%
  transform_sample_counts(function(x) 100*x/sum(x))

There is several different ways to make a plot have a log scale. My favorite is
ggplot(... ) + scale_y_log10()
but that might not work well for stacked bar plots…

What does your current plot look like? Maybe once I see it, I can make a suggestion!

Anahita_Bharadwaj · April 30, 2020, 7:49pm

Oh awesome, thanks so much!
So, I made the heatmap with the following,

physeq %>%
tax_glom("Phylum") %>%
plot_heatmap(taxa.label = "Phylum", sample.label = "Description2", sample.order = "Type", high = "#00FFFF") +
facet_grid(~Description1, scales = "free")+
theme_q2r() +
theme(
axis.text.x=element_text(size = 6, colour = "black", angle=90, hjust=1),
axis.text.y=element_text(size = 6, colour = "black", hjust=1),
strip.text.x = element_text(size = 8, colour = "black"),
panel.spacing= unit (0, "cm")
)

and it looks something like this (I blocked some of the sensitive info),

I am looking to do two things (circled in red),
(1) Report the data is %abundance (and on a log scale... like in the tutorial)
(2) Figure out how to change the label from being just phylum to Kingdom_Phylum. Someone in another forum recommended this

newrank <- paste(physeq@tax_table [ ,"Kingdom"], physeq@tax_table [ ,"Phylum"], sep = "_")
physeq@tax_table@.Data <- cbind(physeq@tax_table@.Data, newrank)

and then drawing the heatmap with the newrank taxa level, but that didn't work for some reason since it wasn't combining/collapsing the abundances within each Kingdom_Phylum.

colinbrislawn · April 30, 2020, 8:34pm

This is pretty easy!

physeq %>%
  tax_glom("Phylum") %>%
  transform_sample_counts(function(x) 100*x/sum(x)) %>% # counts into %
  plot_heatmap(
    taxa.label = "Phylum",
    sample.label = "Description2",
    sample.order = "Type", high = "#00FFFF") +
  facet_grid(~Description1, scales = "free")+
  theme_q2r() +
  theme(
    axis.text.x=element_text(size = 6, colour = "black", angle=90, hjust=1),
    axis.text.y=element_text(size = 6, colour = "black", hjust=1),
    strip.text.x = element_text(size = 8, colour = "black"),
    panel.spacing= unit (0, "cm")
  )

I think that plot_heatmap() in phyloseq is log by default... but let me know what scale you see when running that command.

This is more difficult, as you can see by that complicated command!

Those two lines will make a new taxonomy level called newrank so you will have to update the script to use this new name

physeq %>%
  tax_glom("newrank") %>%
  plot_heatmap(taxa.label = "newrank", ...

You have made a ton of progress over these last two days. Enthusiastic learner indeed!

Colin

Anahita_Bharadwaj · May 1, 2020, 1:14am

haha, I am a week in and loving it, thanks to this forum and all the help. I really do appreciate it!

I changed the code to this but it looks like it doesn't sum up all the abundances within each phyla of the newrank. Instead, it is plotting them all separately.
I think I need to try something like

summarize(Abundance=sum(Abundance))

Would that work in this case?

colinbrislawn · May 1, 2020, 2:48am

I feel that way, everyday. We all have a lot to learn.

Would you be willing to post your full code and your output so I can take a look?

Anahita_Bharadwaj · May 1, 2020, 4:30pm

ya sure, it's exactly the same as what you had posted...

newrank <- paste(physeq@tax_table [ ,"Kingdom"], physeq@tax_table [ ,"Phylum"], sep = "_")

physeq@tax_table@.Data <- cbind(physeq@tax_table@.Data, newrank)

physeq %>%
tax_glom("newrank") %>%
plot_heatmap(taxa.label = "newrank", sample.label = "Description2", sample.order = "Type", high = "#00FFFF") +
facet_grid(~facetorder, scales = "free")+ #I created a manual facet order for plotting
theme_q2r() +
theme(
axis.text.x=element_text(size = 6, colour = "black", angle=90, hjust=1),
axis.text.y=element_text(size = 6, colour = "black", hjust=1),
strip.text.x = element_text(size = 8, colour = "black"),
panel.spacing= unit (0, "cm")
)

But when I do that, it isn't collapsing the phyla. I am not sure why... I think I need something additional code to ask it to combine the phyla together.

Apart from this, I am looking to sort the data from largest to smallest abundance and plot only the top 10 abundant phyla.

colinbrislawn · May 1, 2020, 9:35pm

I’m not 100% sure why the merge is not working.

When you run this command:

physeq %>% tax_glom("newrank")

How many tax levels are in the tax table? (It should be 8 now that we have added newrank)
Is the total number of feature smaller than those in the physeq object?

Anahita_Bharadwaj · May 2, 2020, 9:42pm

Yes, there are 8 taxa after merging.

I am not quite sure I understand this question... but nothing had changed except the addition of the 8th taxa and the taxonomy within that it.

colinbrislawn · May 3, 2020, 12:00am

OK! So it looks like the rank was added, but the tax_glom() function did not work… Does tax_glom("Phylum") still work?

Anahita_Bharadwaj · May 3, 2020, 12:18am

Yes, it does. I was thinking that perhaps some of the phyloseq functions are designed to recognize only specific elements, in this case, 7 taxa? I am guessing this because it also didn’t let me play around with the color scheme too much (I was hoping to use scale_fill_viridis_c() but I couldn’t disable the preexisting fill parameters). It’s possible I went about it wrongly though.