general discussion of qiime2r tutorial

bmillerlab · April 12, 2019, 9:27pm

Thank you. This is amazing for getting my qiime2 output into phyloseq.

After a lot of struggles with other suggestions this worked perfectly. I have gotten my data imported before but I never got the taxonomy and the tree to both work at the same time!!

Just a note to others, I could not use the first suggested method for creating a phyloseq object, "
phy<-qza_to_phyloseq("table.qza", "rooted-tree.qza", "taxonomy.qza","sample-metadata.tsv")
phy"

However, I had no trouble using the second suggested commands for "if you want to have more control" after the data import steps.

physeq<-phyloseq(
  otu_table(SVs$data, taxa_are_rows = T), 
  phy_tree(tree$data), 
  tax_table(as.data.frame(taxtable) %>% select(-Confidence) %>% column_to_rownames("Feature.ID") %>% as.matrix()), #moving the taxonomy to the way phyloseq wants it
  sample_data(metadata %>% as.data.frame() %>% column_to_rownames("#SampleID"))
  )

physeq

Cotissima · March 10, 2020, 4:02pm

Thanks so much @jbisanz, this was super helpful. Not sure this is the right place to ask but, how should I cite this tutorial??

tinabui04 · March 31, 2020, 11:36pm

Hi, I was hoping someone could help me with an error I'm getting when trying to parse the taxonomy table.

taxonomy<-parse_taxonomy(taxonomy$data)
Error in if_else(x == "", NA_character_, x) :
could not find function "if_else"

Thanks in advance!

jbisanz · March 31, 2020, 11:49pm

Oops, if you load the dplyr or tidyverse libraries it should work. I will make this fix now.

tinabui04 · April 1, 2020, 12:28am

Thank you so much! I figured it out just as you responded. Stay safe.

Adbeel · April 19, 2020, 3:15pm

Hello !!

How do I cite this package ??

Greetings !!

jbisanz · April 20, 2020, 4:28pm

Citing the QIIME2 manuscript is great and you could mention the version of qiime2R in the methods. Happy you found it helpful, Jordan

Zach_Burcham · May 18, 2020, 8:26pm

Hello,

I am getting the following error when running the taxonomic heatmap tutorial:

Error: by required, because the data sources have no common variables

Additionally, do you have a tutorial for producing taxonomic barplots through qiime2r?

anamarija · June 2, 2020, 10:18am

Thank you so much for the updated contribution to this tutorial!

jbisanz · June 16, 2020, 12:37am

I added a function that you could use directly for this purpose called taxa_barplot().

Irshad · June 17, 2020, 12:44am

qiime2R (could not find "summarize_taxa" function)

I am trying to use the following code:

taxasums<-summarize_taxa(svtable, taxonomy); taxasums$Genus

However, I get the error saying 'could not find "summarize_taxa" function'. My libraries are up to date. I am using R 3.6.3 and RStudio Version 1.3.959.

Please let me know how to get around this.
Thank you

ja.morillo · June 17, 2020, 11:56am

Hi! thanks, very interesting!

I) Is it possible to collapse samples by a category of the mapfile using the new function taxa_barplot()? (i.e. control vs antibiotic treatment), then showing mean relative abundances instead per_sample abundances?
II) can we collapse "minor phyla" (say all phyla<1% total abundance), to avoid >10 colours in the same figure?
Otherwise, a good R code to get I) an II)? It is not so straightforward, apparently: Creating 100% stacked bar plots with less abundant taxa in a sub-category · Issue #901 · joey711/phyloseq · GitHub
I guess these two functionalities are very demanded. Thanks!!

jbisanz · June 17, 2020, 7:00pm

In answer to 1- I am not a huge fan of collapsing the samples into a single bar as it masks the error and can lead to misleading conclusions. If you want to do this, you can find the average of each feature abundance within a group and then make a barplot from this aggregated sample. 2- I usually limit the plot to the 10 most abundant features (SVs/OTUs/Genera/Phyla etc) and group everything else into a remainder. The taxa_barplot function will do this by default. You could make a custom version of this function at line 39 to instead of pulling out the 10 most abundant, pulling out everything over 1% in average abundance.

jbisanz · June 17, 2020, 7:13pm

It should be there if you have the current version of qiime2R.

ja.morillo · June 18, 2020, 8:14am

I) I agree. II) Great, very useful. Thanks!!

Diki · July 14, 2020, 3:09pm

Hi also wanted to add, how do we customize the feature side of the heatmap? I am having a hard time to figuring out the regex code, or am I missing some handy functions all along?

Diki · July 14, 2020, 3:09pm

Tutorial: Integrating QIIME2 and R for data visualization and analysis using qiime2R

library(tidyverse)
library(qiime2R)

metadata<-read_q2metadata("sample-metadata.tsv")
SVs<-read_qza("table.qza")$data
taxonomy<-read_qza("taxonomy.qza")$data

SVs<-apply(SVs, 2, function(x) x/sum(x)*100) #convert to percent

SVsToPlot<-  
  data.frame(MeanAbundance=rowMeans(SVs)) %>% #find the average abundance of a SV
  rownames_to_column("Feature.ID") %>%
  arrange(desc(MeanAbundance)) %>%
  top_n(30, MeanAbundance) %>%
  pull(Feature.ID) #extract only the names from the table
  
SVs %>%
  as.data.frame() %>%
  rownames_to_column("Feature.ID") %>%
  gather(-Feature.ID, key="SampleID", value="Abundance") %>%
  mutate(Feature.ID=if_else(Feature.ID %in% SVsToPlot,  Feature.ID, "Remainder")) %>% #flag features to be collapsed
  group_by(SampleID, Feature.ID) %>%
  summarize(Abundance=sum(Abundance)) %>%
  left_join(metadata) %>%
  mutate(NormAbundance=log10(Abundance+0.01)) %>% # do a log10 transformation after adding a 0.01% pseudocount. Could also add 1 read before transformation to percent
  left_join(taxonomy) %>%
  mutate(Feature=paste(Feature.ID, Taxon)) %>%
  mutate(Feature=gsub("[kpcofgs]__", "", Feature)) %>% # trim out leading text from taxonomy string
  ggplot(aes(x=SampleID, y=Feature, fill=NormAbundance)) +
  geom_tile() +
  facet_grid(~`body-site`, scales="free_x") +
  theme_q2r() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  scale_fill_viridis_c(name="log10(% Abundance)")
  ggsave("heatmap.pdf", height=4, width=11, device="pdf") # save a PDF 3 inches by 4 inches

Hi Mr Bisanz,
Thank you for your great tutorial.
I am wondering if it is possible to diagnose a change step-by-step, or line-by-line, to our SVs data above. I can do it manually by declare SVs1 <- SVs instead of SVs above, and then delete pipe operator one at a time, but this is difficult. I realise that maybe there is a better way. Maybe debugging or something? Pardon me asking this basic R programming, because the context are already set I think it will be easier here.
I am using Rstudio.

I got no particular reason why choosing heatmap part for this example. It just happened that this part is what I am currently working on.

Thank you.

jbisanz · July 14, 2020, 10:22pm

What I typically do in R studio is highlight down to the row I want to troubleshoot excluding the last pipe character and hit command enter to run. Ex:

SVs %>%
  as.data.frame() %>%
  rownames_to_column("Feature.ID")

You could also just add head() or View() after the pipe on the line of interesting to send the result to the console or a new tab respectively.

If you want to customize the y axis, you slice and dice the text however you want. For example, you could paste together just the phylum and genus with something like mutate(Feature=paste(Phylum, Genus)). In my line above with the gsub command, I am just removing the extra k__Bacteria;p__Bacteroides... to get Bacteria;Bacteroides... to save on space when plotting.

Diki · July 15, 2020, 2:08am

Tutorial: Integrating QIIME2 and R for data visualization and analysis using qiime2R

 mutate(Feature=paste(Feature.ID, Taxon)) %>%
  mutate(Feature=gsub("[kpcofgs]__", "", Feature)) %>% # trim out leading text from taxonomy string
  ggplot(aes(x=SampleID, y=Feature, fill=NormAbundance)) +

Ah yes this is what I wanted all this time! Thank you! Also good tips, I didn't know that we can run selected code. I ended up assigning new SVs upto SVs15 lol. Thank you!

Also, my heatmap did not show similar y label as yours. I think it is because of this part of the code?

....
mutate(Feature=paste(Feature.ID, Taxon)) %>%
mutate(Feature=gsub("[kpcofgs]__", "", Feature)) %>% # trim out leading text from taxonomy string
ggplot(aes(x=SampleID, y=Feature, fill=NormAbundance)) +
....

The y from aes() asking for Feature column, but other than k__Bacteria;p__Bacteroides…etc line, there are also some random character of Feature.ID from mutate(paste()) function above it. That, or is it just me making mistake? What do you think, sir?
Also, what is that Feature.ID? They are from QIIME2 table.qza but, what algorithms do they use to generate the Feature.ID?

Thank you.

jbisanz · July 15, 2020, 9:29pm

I believe the Feature.ID you would be seeing is the sequence's hash which is a unique identifier for the SV (see here). You could just as easily replace them with arbitrary names like "SV_1" if you wanted. To do that, add an extra column to your taxonomy object with something like:

taxonomy<-taxonomy %>% mutate(SV_ID=paste0("SV_", 1:nrow(.)))

Then paste together SV_ID and Taxon to make Feature.