So, essentially very similar, except I changed the axes. However, I am getting the following warnings and about half my data isn’t getting plotted,
Warning messages:
1: Column `SampleID` joining factors with different levels, coercing to character vector
2: Removed 48 rows containing missing values
(geom_point).
I checked to confirm that the order of my SampleIDs match up in both the metadata and the Unweighted UniFrac PCoA files. When I visualize them on QIIME2, they look fine to me. I was hoping you could help me figure this out. Thanks for your help!
Errors show that something has gone wrong and needs to be fixed, but warning are just friendly notes letting you know how things are going. I often get Warnings like this, and this is OK!
If your R output graphs look ok, you should be good to go.
Here is what these two warnings mean:
1: Column SampleID joining factors with different levels, coercing to character vector
Now you have a ‘character vector!’
2: Removed 48 rows containing missing values
Rows with missing values are removed because you can’t graph them anyway. As long as you see the expected number of samples in the plot, this is fine.
Thanks for your response! I don't have any NA values in my data and that's the reason this is confusing. I checked what's happening with the combined file and this is how it looks. It seems like between the two files, although the #SampleID seems to be the same name, it isn't matching up. Is there something I am missing?
EDIT: Oh gosh! I figured it out. Some of the metadata sample names had an invisible "space" after the names, so R was picking them up as different samples. I had to manually go through and remove all the spaces. Gotta be careful while making that metadata file... eek! Thanks for your input through!
I have been using this tutorial to draw heatmaps for my data. It works well, thanks so much @jbisanz !
Now, I am trying to figure out few small things,
(1) Is there a way I can collapse the taxonomy data to the phylum level?
(2) Similarly, if there is a way to plot, abundances within specific phyla separately? (I tried to sort it in alphabetical order instead of mean abundance but couldn’t figure out the secondary sorting within that).
(3) I have 12 replicate data from each experimental condition. Would it be possible to collapse across these replicates and draw the heatmap for the ‘average’ relative abundance across the different phyla? (the bigger question also being if this is an advisable/correct thing to do?)
Also, can you please explain what the line summarize(Abundance=sum(Abundance))
does?
I have been trying to do this on my own but am fairly new to R. So, I would really appreciate your help with this or if you could point be to other sources as I couldn’t find any. Thank you for your help!
ps %>%
tax_glom("Phylum") %>%
plot_bar()
# You can save the phylum level object for later use
ps.phylum <- ps %>%
tax_glom("Phylum")
ps.phylum %>% plot_bar()
Yes, that's right! Like the top 10 Firmicutes, and so on...
I will try these out. In the example above, I don't think phyloseq library was used to draw the heatmaps (I could be wrong) and also, the taxonomy.qza file is in combined form (see picture). I guess I will need to use something like parse_taxonomy(taxonomy, tax_sep, trim_extra)
to first separate the taxa and then use the code you recommended?
But if I did this, it would impact the Y-axis of the original heatmap I've already developed using the tutorial. I don't know if I am explaining it correctly... basically the way that the data needs to be arranged for phyloseq library is different from what's on this tutorial (or maybe I am wrong?). And so, I am trying to figure out if I can use one or the other to create these additional heatmaps. I would really appreciate your advice on this! Thanks!
ps.just_firmicutes.family <- ps %>%
subset_taxa(Phylum == "Firmicutes") %>%
tax_glom("Family")
ps.just_firmicutes.family.topnames <- names(sort(taxa_sums(ps.just_firmicutes.family), TRUE)[1:10])
ps.just_firmicutes.family.top <- prune_species(ps.phylum.top10.names, ps.just_firmicutes.family)
ps.just_firmicutes.family.top %>%
plot_bar() + labs(title = "Top 10 Firmicutes at the Family level")
There are lots of different ways to plot heatmaps, and none of them are perfect. You are correct: you will need to reorganize your data to make it match the input for the program.
That's true, I really like the tutorial as it was a very intuitive way for me to understand how the heatmaps were drawn (going through it one step at a time). I will try to figure out how to tweak it for my requirements, but if someone else gets to it before I do, I would love to see it! (Enthusiastic learner )
I understand. I don't actually have it as a phyloseq object but I will check it out and try to learn it. Thanks again!
All of my examples start with data in a phyloseq object, which I called ps. Once you do have your data in a phyloseq object, all of the plot_heatmap() plot_ord() plot_alpha() function that come with phyloseq are quite easy to use.
Thank you! I think I am getting the hang of it. I have just one more question... I see that the abundances in the charts are plotted as is. Is there any way that I can go in and change the abundances to relative abundance (on a percentage basis) and convert it into a log plot similar to what's in the tutorial?
There is several different ways to make a plot have a log scale. My favorite is ggplot(... ) + scale_y_log10()
but that might not work well for stacked bar plots…
What does your current plot look like? Maybe once I see it, I can make a suggestion!
I am looking to do two things (circled in red),
(1) Report the data is %abundance (and on a log scale... like in the tutorial)
(2) Figure out how to change the label from being just phylum to Kingdom_Phylum. Someone in another forum recommended this
and then drawing the heatmap with the newrank taxa level, but that didn't work for some reason since it wasn't combining/collapsing the abundances within each Kingdom_Phylum.
haha, I am a week in and loving it, thanks to this forum and all the help. I really do appreciate it!
I changed the code to this but it looks like it doesn't sum up all the abundances within each phyla of the newrank. Instead, it is plotting them all separately.
I think I need to try something like
physeq %>%
tax_glom("newrank") %>%
plot_heatmap(taxa.label = "newrank", sample.label = "Description2", sample.order = "Type", high = "#00FFFF") +
facet_grid(~facetorder, scales = "free")+ #I created a manual facet order for plotting
theme_q2r() +
theme(
axis.text.x=element_text(size = 6, colour = "black", angle=90, hjust=1),
axis.text.y=element_text(size = 6, colour = "black", hjust=1),
strip.text.x = element_text(size = 8, colour = "black"),
panel.spacing= unit (0, "cm")
)
But when I do that, it isn't collapsing the phyla. I am not sure why... I think I need something additional code to ask it to combine the phyla together.
Apart from this, I am looking to sort the data from largest to smallest abundance and plot only the top 10 abundant phyla.
How many tax levels are in the tax table? (It should be 8 now that we have added newrank)
Is the total number of feature smaller than those in the physeq object?
Yes, it does. I was thinking that perhaps some of the phyloseq functions are designed to recognize only specific elements, in this case, 7 taxa? I am guessing this because it also didn’t let me play around with the color scheme too much (I was hoping to use scale_fill_viridis_c() but I couldn’t disable the preexisting fill parameters). It’s possible I went about it wrongly though.