Tutorial: Integrating QIIME2 and R for data visualization and analysis using qiime2R


(Jordan Bisanz) #1

Tutorial: Integrating QIIME2 and R for data visualization and analysis using qiime2R

Background

This tutorial is demonstrating how to integrate QIIME2 (2018.4) with R for data visualization and analysis using elements from the tidyverse with the package qiime2R (v0.12). This allows for customized plotting and mapping of metadata to graph aesthetics. You could also use this tool to create a phyloseq object as per the readme for qiime2R or the last section of this tutorial. It is drawing on the moving pictures tutorial.


Libraries

library(tidyverse)
devtools::install_github("jbisanz/qiime2R")
library(qiime2R)

Data import

This tutorial takes metadata and the QIIME2 artifacts (.qza) for:

When read_qza() is used, an object is returned (a named list) containing many peices of information with the actual data stored in object$data. We are also checking the unique identifer for every file by examining the $uuid variable.

metadata<-read_tsv("sample-metadata.tsv")
metadata
## # A tibble: 35 x 11
##    `#SampleID` BarcodeSequence LinkerPrimerSeq… BodySite Year  Month Day  
##    <chr>       <chr>           <chr>            <chr>    <chr> <chr> <chr>
##  1 #q2:types   categorical     categorical      categor… nume… nume… nume…
##  2 L1S8        AGCTGACTAGTC    GTGCCAGCMGCCGCG… gut      2008  10    28   
##  3 L1S57       ACACACTATGGC    GTGCCAGCMGCCGCG… gut      2009  1     20   
##  4 L1S76       ACTACGTGTGGT    GTGCCAGCMGCCGCG… gut      2009  2     17   
##  5 L1S105      AGTGCGATGCGT    GTGCCAGCMGCCGCG… gut      2009  3     17   
##  6 L2S155      ACGATGCGACCA    GTGCCAGCMGCCGCG… left pa… 2009  1     20   
##  7 L2S175      AGCTATCCACGA    GTGCCAGCMGCCGCG… left pa… 2009  2     17   
##  8 L2S204      ATGCAGCTCAGT    GTGCCAGCMGCCGCG… left pa… 2009  3     17   
##  9 L2S222      CACGTGACATGT    GTGCCAGCMGCCGCG… left pa… 2009  4     14   
## 10 L3S242      ACAGTTGCGCGA    GTGCCAGCMGCCGCG… right p… 2008  10    28   
## # ... with 25 more rows, and 4 more variables: Subject <chr>,
## #   ReportedAntibioticUsage <chr>, DaysSinceExperimentStart <chr>,
## #   Description <chr>
SVs<-read_qza("table.qza")
names(SVs) #information about the object
## [1] "uuid"       "type"       "format"     "contents"   "version"   
## [6] "data"       "provenance"
SVs$uuid
## [1] "6a560288-898e-4c1d-92ac-dd8d7822dcc9"
SVs$data[1:5,1:5] #show the first 5 samples and features
##                                  L1S105 L1S140 L1S208 L1S257 L1S281
## 4b5eeb300368260019c1fbc7a3c718fc   2222      0      0      0      0
## fe30ff0f71a38a39cf1717ec2be3a2fc      5      0      0      0      0
## d29fe3c70564fc0f69f2c03e0d1e5561      0      0      0      0      0
## 868528ca947bc57b69ffdf83e6b73bae      0   2276   2156   1205   1772
## 154709e160e8cada6bfb21115acc80f5    812   1176    713    407    242
taxonomy<-read_qza("taxonomy.qza")
taxonomy$uuid
## [1] "3d74d83c-acfe-48c6-9d8d-a621e47db4c3"
taxtable<-taxonomy$data %>% as.tibble() %>% separate(Taxon, sep="; ", c("Kingdom","Phylum","Class","Order","Family","Genus","Species")) #convert the table into a tabular split version
taxtable
## # A tibble: 759 x 9
##    Feature.ID  Kingdom  Phylum Class Order Family Genus Species Confidence
##    <fct>       <chr>    <chr>  <chr> <chr> <chr>  <chr> <chr>        <dbl>
##  1 48a6cea7ea… k__Bact… p__Fi… c__C… o__C… f__La… g__C… s__          0.909
##  2 1f50ac2506… k__Bact… p__Pr… c__E… o__C… f__Ca… g__C… s__          1.000
##  3 1ea96d5955… k__Bact… p__Fi… c__C… o__C… f__[T… g__P… s__          1.000
##  4 7f4909ae8d… k__Bact… p__Pr… c__A… o__C… f__Ca… g__   s__          0.899
##  5 a841dd7e2f… k__Bact… p__Fi… c__C… o__C… f__Ru… g__R… <NA>         0.738
##  6 c5b61a414b… k__Bact… p__Pr… c__G… o__P… f__Pa… g__   s__          0.988
##  7 53dd9f9a89… k__Bact… p__Ac… c__A… o__A… f__No… g__N… s__          0.750
##  8 1e9fc37a11… k__Bact… p__Fu… c__F… o__F… f__Le… g__L… s__          1.000
##  9 26583b57be… k__Bact… p__Cy… c__C… o__C… f__Tr… g__   s__          0.719
## 10 1830c14ead… k__Bact… p__TM7 c__T… o__   f__    g__   s__          0.999
## # ... with 749 more rows
tree<-read_qza("rooted-tree.qza")
tree$uuid
## [1] "df4a8d84-2386-421c-86e7-eebe6412a982"
tree$data
## 
## Phylogenetic tree with 759 tips and 757 internal nodes.
## 
## Tip labels:
##  e9d3af4175420ffab49c29d1a6bb2030, 1830c14ead81ad012f1db0e12f8ab6a4, ca08eabd09756731f095632656d45b01, bfdc8d2e7693336b4f3781920d3fa253, 54a5af5112600a1faf06be268f95616e, 01173073677a287321be3484fbed0007, ...
## Node labels:
##  root, , 0.765, 0.689, 0.455, 0.954, ...
## 
## Rooted; includes branch lengths.
shannon<-read_qza("shannon_vector.qza")
shannon$uuid
## [1] "cdf105ef-f98b-4939-b9e9-7fdb683ae359"
head(shannon$data)
##         shannon
## L1S105 3.767582
## L1S140 3.869605
## L1S208 4.385106
## L1S257 4.644614
## L1S281 4.551270
## L1S57  4.115363
pco<-read_qza("unweighted_unifrac_pcoa_results.qza")
pco$uuid
## [1] "b440de1c-72e8-4bcc-bbce-a10c2f7f21c3"
head(pco$data$ProportionExplained) #this returns the variance explained
##        PC1        PC2        PC3        PC4        PC5        PC6 
## 0.34629933 0.27244901 0.05629367 0.04934895 0.03273142 0.03083458
pco$data$Vectors[1:5, 1:3] #this returns the coordinates, here we look at the first 5 samples and 2 axes.
##   SampleID        PC1         PC2
## 1   L1S105 -0.3749660 -0.05224791
## 2   L1S140 -0.3821963 -0.04271688
## 3   L1S208 -0.4566534 -0.09740696
## 4   L1S257 -0.4974592 -0.18476858
## 5   L1S281 -0.4976634 -0.16765894

Plotting alpha diversity

Now that we have the data we can create a figure using any combination of mappings of aesthetics to metadata. Below is an example using the alpha diversity. This example is over the top, but shows ways that metadata categories can be mapped.

shannon$data %>%
  as.data.frame() %>%
  rownames_to_column("#SampleID") %>%
  left_join(metadata) %>%
  mutate(DaysSinceExperimentStart=as.numeric(DaysSinceExperimentStart)) %>% #coerce this to be stored as number
    ggplot(aes(x=DaysSinceExperimentStart, y=shannon, group=Subject, color=BodySite, shape=ReportedAntibioticUsage)) +
    geom_point(size=4) +
    geom_line() +
    facet_grid(BodySite~Subject) + #plot body sites across rows and subjects across columns
    theme_bw() +
    xlab("Time (days)") +
    ylab("Shannon Diversity") +
    ggtitle("Shannon diversity across time")

Plotting beta diversity

Now we can do something similar to above using the information from the PCoA of unweighted UniFrac distances. For bonus points I am adding in the shannon diversity and mapping this to size.

pco$data$Vectors %>%
  rename("#SampleID"=SampleID) %>% #rename to match the metadata table
  left_join(metadata) %>%
  left_join(shannon$data %>% rownames_to_column("#SampleID")) %>%
  ggplot(aes(x=PC1, y=PC2, shape=Subject, color=BodySite, size=shannon)) +
  geom_point() +
  xlab(paste("PC1: ", round(100*pco$data$ProportionExplained[1]), "%")) +
  ylab(paste("PC2: ", round(100*pco$data$ProportionExplained[2]), "%")) +
  theme_bw() +
  ggtitle("Unweighted UniFrac")


Creating a phyloseq object from QIIME artifacts

The phyloseq() function is used to build a phyloseq object as per the explaination here. To create a ~standard phyloseq object you can use the qza_to_phyloseq() function as below:

phy<-qza_to_phyloseq("table.qza", "rooted-tree.qza", "taxonomy.qza","sample-metadata.tsv")
phy
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 759 taxa and 34 samples ]
## sample_data() Sample Data:       [ 34 samples by 10 sample variables ]
## tax_table()   Taxonomy Table:    [ 759 taxa by 7 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 759 tips and 757 internal nodes ]

Or if you want to have more control over the object adding more or less data, you can make it yourself as below:

library(phyloseq)
physeq<-phyloseq(
  otu_table(SVs$data, taxa_are_rows = T), 
  phy_tree(tree$data), 
  tax_table(as.data.frame(taxtable) %>% select(-Confidence) %>% column_to_rownames("Feature.ID") %>% as.matrix()), #moving the taxonomy to the way phyloseq wants it
  sample_data(metadata %>% as.data.frame() %>% column_to_rownames("#SampleID"))
  )

physeq
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 759 taxa and 34 samples ]
## sample_data() Sample Data:       [ 34 samples by 10 sample variables ]
## tax_table()   Taxonomy Table:    [ 759 taxa by 7 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 759 tips and 757 internal nodes ]

Now you can use R-only tools and all those snazzy phyloseq functions or check out my package MicrobeR.

Please send feedback, suggestions or questions to [email protected].


Import table of taxonomic composition with Phyloseq
CSS normalization of sequence counts with metagenomeSeq in R
Importing dada2 and Phyloseq objects to QIIME 2
Issues using qiime2R
Filtering Blank Taxonomic Ranks for Import into Phyloseq