Phyloseq graph using Q2 ASV code : how to get the real Phylum or Genus name?

Hello!

I have used Q2 a lot lately and I want to use some features of Phyloseq illustration in R now but I have very often a problem with the code names used by Q2 that looks like "2ec862esddlr88965a8855d665er2245" which is not helping for the analysis.
I am wondering whether during my exportation of Q2 object to Phyloseq object I can do something about it so I won't get these ugly name ;-)!

This codes for instance:

clean_physeq2essai_phylum = phyloseq::tax_glom(clean_physeq2essai, "Phylum")
phyloseq::psmelt(clean_physeq2essai_phylum) %>%
  ggplot(data = ., aes(x = Group_Timepoint, y = Abundance)) +
  geom_boxplot(outlier.shape  = NA) +
  geom_jitter(aes(color = OTU), height = 0, width = .2) +
  labs(x = "", y = "Abundance\n") +
  facet_wrap(~ OTU, scales = "free")

Gives my that
plot

1 Like

Hello Mathias! Welcome back,

Can you post the output of this line?

phyloseq::psmelt(clean_physeq2essai_phylum) %>% head()

That will tell us all the columns of the R data.frame(), and one of those columns should be an informative taxonomy name like Phylum, and we can pass that column name to ggplot.

I think we are really close to solving it… :female_detective:

Colin

P.S. I’ve put your code inside of a code block. They’re cool

Hello Colin,

Thanks for your reply… I know I am not at the perfect forum for this type of question… I was afraid I would be blocked :wink:
So here is the output :slight_smile: :
OTU Sample Abundance ATB_VEHICULE Group_ATB Timepoint Num_mice SameMice2Timepoints Group_mice
71 62d492efc804b7f173b5286cffca8603 MS35 0.8745698 ATB Long D10 70 YES ABLong_D10
59 62d492efc804b7f173b5286cffca8603 MS32 0.8688360 ATB Short D10 40 YES ABShort_D10
94 9c79cbb243d988ebe749b6d2753696ed MS1 0.8471313 VEHICULE None D_Bef24 A NO Ctrl_Dbef24
55 62d492efc804b7f173b5286cffca8603 MS31 0.8471070 ATB Short D10 38 YES ABShort_D10
130 9c79cbb243d988ebe749b6d2753696ed MS7 0.8447736 VEHICULE None D_Bef24 G NO Ctrl_Dbef24
120 9c79cbb243d988ebe749b6d2753696ed MS3 0.8400406 VEHICULE None D_Bef24 C NO Ctrl_Dbef24
Group_Timepoint accurate_group Kingdom Phylum
71 D10_Abx 5_D10_Abx D_0__Bacteria D_1__Proteobacteria
59 D10_Abx 5_D10_Abx D_0__Bacteria D_1__Proteobacteria
94 D0_no_Abx 1_D-24_no_abx D_0__Bacteria D_1__Bacteroidetes
55 D10_Abx 5_D10_Abx D_0__Bacteria D_1__Proteobacteria
130 D0_no_Abx 1_D-24_no_abx D_0__Bacteria D_1__Bacteroidetes
120 D0_no_Abx 1_D-24_no_abx D_0__Bacteria D_1__Bacteroidetes

So yes there is indeed the “Phylum” column… but sorry I don’t know how to put the right wording to replace the “ugly” name by the phylum in the figure !!! :frowning:
Thank you for your help.

Best,
Mathias

Ok ok ! I am back… your suggestion was enough to put me on the right track!
I feel ashamed that I did not think about that :roll_eyes:
I just have to change OTU for Phylum… da!
Many thanks Colin !

Here is the code modified:

clean_physeq2essai_phylum = phyloseq::tax_glom(ps_rel_abund, "Phylum")
phyloseq::psmelt(clean_physeq2essai_phylum) %>%
  ggplot(data = ., aes(x = accurate_group, y = Abundance)) +
  geom_boxplot(outlier.shape  = NA) +
  geom_jitter(aes(color = Phylum), height = 0, width = .2) +
  labs(x = "", y = "Abundance\n") +
  facet_wrap(~ Phylum, scales = "free")
1 Like

Hello Mathias,

You got this!

Here’s how ggplot works:
ggplot maps metadata columns, to aesthetics using the aes() function.
Note that Group_Timepoint, Abundance, and OTU are all columns in your dataframe.

aes(x = Group_Timepoint, position on the x axis
 y = Abundance, position on the y axis 
OTU = color of each point

But you can map these out any way that you want! Try this :point_down:

phyloseq::psmelt(clean_physeq2essai_phylum) %>%
  ggplot(aes(x = Group_Timepoint, y = Abundance, size = Abundance)) +
  geom_boxplot(outlier.shape  = NA) +
  geom_jitter(aes(color = Phylum), height = 0, width = .2) +
  labs(x = "", y = "Abundance\n") +
  facet_wrap(~ Phylum, scales = "free")

So basically I’ve replaced OTU with Phylum, so that it uses the data in the other column.

Why are the sizes different? :scream_cat: Did you notice I snuck in size = Abundance to the aes() function? You should probably take that out… but you can also edit the other settings too!

Colin

2 Likes

Great Colin!
“size = Abundance” is a nice touch! I’ll keep it.
Many thanks for your efficiency!
:+1:

1 Like

Hey Colin,

I have a tiny little question… what would I have to do if I want to show only one Phylum like “D_1_Proteobacteria”?

Well got my answer alone… (it was too late yesterday :slight_smile: ) it is a simple “subset” like:

ggplot(subset(q, Family == "D_1_Proteobacteria"), aes(x = Group_Timepoint, y = Abundance, size = Abundance)) +
  geom_boxplot(outlier.shape  = NA) +
  geom_jitter(aes(color = Family), height = 0, width = .2) +
  labs(x = "", y = "Abundance\n") +
  theme(axis.text.x = element_text(angle = 40, hjust = 1))

Cheers! :slight_smile:
Mat

1 Like

Hello Mat,

Glad you found the answer!

All the packages in the Tidyverse have a ‘Right Way’ :point_left: to do stuff like this, and it uses filter() instead of subset():

phyloseq::psmelt(clean_physeq2essai_phylum) %>%
  filter(Family == "D_1_Proteobacteria") %>%
  ggplot(aes(x = Group_Timepoint, y = Abundance, size = Abundance)) + ...

But you can do what you want! :stuck_out_tongue:

Colin

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.