having trouble with Qiime2R package, after import to phyloseq object, taxa table only read kingdom

akalichen · April 1, 2020, 6:33pm

Hello, first I would thank @jbisanz for the wonderful package. The ability to transfer data between qiime2 and phyloseq is amazing.

While I was following the tutorial (Tutorial: Integrating QIIME2 and R for data visualization and analysis using qiime2R - #26). I have encountered 2 problems, the first one seems to be fixed but I was wondering if we can have a better way to do that.

The second problem is that when I important qiime2 output into phyloseq objects as the tutorial suggested. I realized that my tax table only has a kingdom name, the phylum to genus was missing.

attached is my R code pretty much the same with the tutorial. and some of the output files that I have used.

Try to connect qiime2 workflow with my R flow. following (Tutorial: Integrating QIIME2 and R for data visualization and analysis using qiime2R)

# get package qiime2R
if (!requireNamespace("devtools", quietly = TRUE)){install.packages("devtools")}
devtools::install_github("jbisanz/qiime2R") # current version is 0.99.20

library(qiime2R)
library(ggplot2)

We will start be reading in a table of sequence variants (SVs):

SVs <- read_qza("/Users/Cheng.Clay.Li/EAGCB/SBdata/Qiime2SBCB/SBCB-table.qza")
names(SVs)

#To access the actual data stored within the object, access the data as below:


SVs$data[1:5,1:5] #show first 5 samples and first 5 taxa

We can also look at the unique identifier for this object:

SVs$uuid

We can see the type of artifact:

SVs$type

We can also print the providence; however, it is probably easier to use the q2view tool for a graphical aid in its interpretation.

print_provenance(SVs)

Reading Metadata

If you are using a qiime2 metadata file (outlined here 3), you can use the supplied function read_q2metadata() as below. If using a standard tsv or csv file, you can use read.table(), readr::read_tsv(), or readr::csv().

metadata<-read_q2metadata("/Users/Cheng.Clay.Li/EAGCB/SBdata/Qiime2SBCB/SBCB_METADATA_alt3.tsv")
head(metadata) # show top lines of metadata

Reading Taxonomy

Note, when taxonomy is imported, a single string is returned along with a confidence score. For many analysis we will want to break up this string and for that purpose the parse_taxonomy() function is provided:

taxonomy<-read_qza("/Users/Cheng.Clay.Li/EAGCB/SBdata/Qiime2SBCB/taxonomy.qza")
head(taxonomy$data)

Creating a Phyloseq Object

A wrapper function called qza_to_phyloseq() is provided which links multiple read_qza() calls together to create a phyloseq object for subsequent analysis as per the phyloseq tutorials 59. An example usage is shown below:

# the issue you have has nothing to do with your taxonomy table, but a mismatch between your metadata and feature table (SVs or your favourite term here for denoised sequences). (https://forum.qiime2.org/t/qiime2r-missing-sample/8681) 
# it seems that the phyloseq was not right, because my Sample ID does not match with the colnames in SVs$data
table<-SVs$data
metadata$SampleID <- colnames(table)

all(metadata$SampleID %in% colnames(table)) # has to be true. 

write.csv(metadata, file = "metadata_try.csv")
# and then I copied and pasted the SampleID in this csv to googlesheet my metadata, and downloaded as a new metadata for the qza_to_phyloseq to read. 

physeq<-qza_to_phyloseq(
    features="/Users/Cheng.Clay.Li/EAGCB/SBdata/Qiime2SBCB/SBCB-table.qza",
    tree="/Users/Cheng.Clay.Li/EAGCB/SBdata/Qiime2SBCB/SBCB-rooted_tree.qza",
    "/Users/Cheng.Clay.Li/EAGCB/SBdata/Qiime2SBCB/taxonomy.qza",
    metadata = "/Users/Cheng.Clay.Li/EAGCB/SBdata/Qiime2SBCB/SBCB_METADATA_rename.tsv"
    )
physeq

# the taxatable looks so werid. 
library(phyloseq)
Taxtab.q2 <- as(tax_table(physeq), "matrix")
OTU.q2 = as(otu_table(physeq), "matrix")

write.csv(Taxtab.q2, file = "taxtab_q2.csv")
write.csv(OTU.q2, file = "OTU_q2.csv")

SBCB_METADATA.tsv (1.7 KB) SBCB-rooted_tree.qza (272.5 KB) SBCB-table.qza (241.3 KB) taxonomy.qza (353.2 KB) OTU_q2.csv (516.0 KB) taxtab_q2.csv (1.1 MB)

jbisanz · April 1, 2020, 7:12pm

One problem is that it appears there is a trailing space in all of your sample names in the metadata:
[1] "lane1-s081-indexN728-D-S513-D-TGCAGCTA-TCGACTAG-A11 "
[2] "lane1-s082-indexN728-D-S515-D-TGCAGCTA-TTCTAGCT-B11 "
[3] "lane1-s083-indexN728-D-S516-D-TGCAGCTA-CCTAGAGT-C11 "
[4] "lane1-s084-indexN728-D-S517-D-TGCAGCTA-GCGTAAGA-D11 "
[5] "lane1-s085-indexN728-D-S518-D-TGCAGCTA-CTATTAAG-E11 "
[6] "lane1-s086-indexN728-D-S520-D-TGCAGCTA-AAGGCTAT-F11 "
[7] "lane1-s087-indexN728-D-S521-D-TGCAGCTA-GAGCCTTA-G11 "
[8] "lane1-s088-indexN728-D-S522-D-TGCAGCTA-TTATGCGA-H11 "
[9] "lane1-s089-indexN729-D-S513-D-TCGACGTC-TCGACTAG-A12 "
[10] "lane1-s090-indexN729-D-S515-D-TCGACGTC-TTCTAGCT-B12 "
[11] "lane1-s091-indexN729-D-S516-D-TCGACGTC-CCTAGAGT-C12 "
[12] "lane1-s092-indexN729-D-S517-D-TCGACGTC-GCGTAAGA-D12 "
[13] "lane1-s093-indexN729-D-S518-D-TCGACGTC-CTATTAAG-E12 "
[14] "lane1-s094-indexN729-D-S520-D-TCGACGTC-AAGGCTAT-F12 "
[15] "lane1-s095-indexN729-D-S521-D-TCGACGTC-GAGCCTTA-G12 "
[16] "lane1-s096-indexN729-D-S522-D-TCGACGTC-TTATGCGA-H12 "

As per the taxonomy I am not sure what the problem is because it appears to be importing correctly for me:

library(tidyverse)
library(qiime2R)

taxonomy<-read_qza("taxonomy.qza")$data
head(taxonomy)
                        Feature.ID
1 06f3d27a300d91e7f2348d603ba03c22
2 3973ff630131fdfce4705ab78e5e8cae
3 9483a61718f86945d916374d6e48facb
4 3ed1f800788930b3948659c758374f6e
5 ae267a819f7e238fb82dd09907d00690
6 eca194442c3a0521c03e1be60da986dd
                                                                                                                                                     Taxon
1 D_0__Bacteria;D_1__Cyanobacteria;D_2__Oxyphotobacteria;D_3__Chloroplast;D_4__Virgulinella fragilis;D_5__Virgulinella fragilis;D_6__Virgulinella fragilis
2                                     D_0__Bacteria;D_1__Epsilonbacteraeota;D_2__Campylobacteria;D_3__Campylobacterales;D_4__Sulfurovaceae;D_5__Sulfurovum
3                                                               D_0__Bacteria;D_1__Actinobacteria;D_2__Acidimicrobiia;D_3__Actinomarinales;D_4__uncultured
4                                                               D_0__Bacteria;D_1__Actinobacteria;D_2__Acidimicrobiia;D_3__Actinomarinales;D_4__uncultured
5                                 D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Chromatiales;D_4__Chromatiaceae;D_5__Candidatus Thiobios
6                                                                                  D_0__Bacteria;D_1__Cyanobacteria;D_2__Oxyphotobacteria;D_3__Chloroplast
  Confidence
1  0.8558629
2  0.9999990
3  0.9987362
4  0.9992131
5  0.9999855
6  1.0000000

And parsing correctly:

parse_taxonomy(taxonomy) %>% head()

                                  Kingdom             Phylum               Class
06f3d27a300d91e7f2348d603ba03c22 Bacteria      Cyanobacteria    Oxyphotobacteria
3973ff630131fdfce4705ab78e5e8cae Bacteria Epsilonbacteraeota     Campylobacteria
9483a61718f86945d916374d6e48facb Bacteria     Actinobacteria      Acidimicrobiia
3ed1f800788930b3948659c758374f6e Bacteria     Actinobacteria      Acidimicrobiia
ae267a819f7e238fb82dd09907d00690 Bacteria     Proteobacteria Gammaproteobacteria
eca194442c3a0521c03e1be60da986dd Bacteria      Cyanobacteria    Oxyphotobacteria
                                             Order                Family
06f3d27a300d91e7f2348d603ba03c22       Chloroplast Virgulinella fragilis
3973ff630131fdfce4705ab78e5e8cae Campylobacterales         Sulfurovaceae
9483a61718f86945d916374d6e48facb   Actinomarinales            uncultured
3ed1f800788930b3948659c758374f6e   Actinomarinales            uncultured
ae267a819f7e238fb82dd09907d00690      Chromatiales         Chromatiaceae
eca194442c3a0521c03e1be60da986dd       Chloroplast                  <NA>
                                                 Genus               Species
06f3d27a300d91e7f2348d603ba03c22 Virgulinella fragilis Virgulinella fragilis
3973ff630131fdfce4705ab78e5e8cae            Sulfurovum                  <NA>
9483a61718f86945d916374d6e48facb                  <NA>                  <NA>
3ed1f800788930b3948659c758374f6e                  <NA>                  <NA>
ae267a819f7e238fb82dd09907d00690   Candidatus Thiobios                  <NA>
eca194442c3a0521c03e1be60da986dd                  <NA>                  <NA>

When I fix your metadata table I also appear to be correctly generating the qza.

test<-qza_to_phyloseq(
  features="SBCB-table.qza",
  tree="SBCB-rooted_tree.qza",
  metadata="SBCB_METADATA_mod.tsv",
  taxonomy<-"taxonomy.qza"
)

In regards to things only being assigned a phylum, that appears to be the nature of your data. Below I sampled 10 features and compared their imported taxonomy against what was inside the original file before parsing:

library(qiime2R)
library(phyloseq)

taxonomy<-read_qza("taxonomy.qza")$data

test<-qza_to_phyloseq(
  features="SBCB-table.qza",
  tree="SBCB-rooted_tree.qza",
  metadata="SBCB_METADATA_mod.tsv",
  taxonomy<-"taxonomy.qza"
)

featstopull<-sample(rownames(otu_table(test)), 10)

taxonomy %>% 
  filter(Feature.ID %in% featstopull) %>%
  left_join(
    tax_table(test)[featstopull,] %>%
      as.data.frame() %>%
      rownames_to_column("Feature.ID")
  )
                     
Feature.ID	Taxon	Confidence	Kingdom	Phylum	Class	Order	Family	Genus	Species
89e910e2c6d49e65367e9b3f5c71713b	D_0__Bacteria;D_1__Actinobacteria;D_2__Acidimicrobiia;D_3__Microtrichales;D_4__Microtrichaceae;D_5__Sva0996 marine group;D_6__uncultured organism	0.901916034	Bacteria	Actinobacteria	Acidimicrobiia	Microtrichales	Microtrichaceae	Sva0996 marine group	uncultured organism
879a5bfd7c260c5ecd54e1265766b3e8	D_0__Bacteria;D_1__Proteobacteria;D_2__Alphaproteobacteria;D_3__Rhizobiales;D_4__Beijerinckiaceae	0.989189013	Bacteria	Proteobacteria	Alphaproteobacteria	Rhizobiales	Beijerinckiaceae	NA	NA
c45c7dc3bf25bd6742c6bd86e3ae59bc	D_0__Bacteria;D_1__Chloroflexi;D_2__Anaerolineae;D_3__Anaerolineales;D_4__Anaerolineaceae;D_5__uncultured	0.999949695	Bacteria	Chloroflexi	Anaerolineae	Anaerolineales	Anaerolineaceae	uncultured	NA
0194334deb98eccb7137e86ae2d9ad64	D_0__Bacteria;D_1__Acidobacteria;D_2__Subgroup 21;D_3__uncultured bacterium;D_4__uncultured bacterium;D_5__uncultured bacterium;D_6__uncultured bacterium	0.884177923	Bacteria	Acidobacteria	Subgroup 21	uncultured bacterium	uncultured bacterium	uncultured bacterium	uncultured bacterium
1cb3ab429f280c306a186478e42291aa	D_0__Bacteria;D_1__Chloroflexi;D_2__Dehalococcoidia;D_3__Napoli-4B-65;D_4__uncultured bacterium;D_5__uncultured bacterium;D_6__uncultured bacterium	0.967457623	Bacteria	Chloroflexi	Dehalococcoidia	Napoli-4B-65	uncultured bacterium	uncultured bacterium	uncultured bacterium
b34b34fbab24e827710de17fbf33fe99	D_0__Bacteria;D_1__Cyanobacteria;D_2__Oxyphotobacteria;D_3__Chloroplast;D_4__Ulva sp. UNA00071828;D_5__Ulva sp. UNA00071828;D_6__Ulva sp. UNA00071828	1	Bacteria	Cyanobacteria	Oxyphotobacteria	Chloroplast	Ulva sp. UNA00071828	Ulva sp. UNA00071828	Ulva sp. UNA00071828
fc00320f784df1a55ba8ed5e2713d91f	D_0__Bacteria	0.97711373	Bacteria	NA	NA	NA	NA	NA	NA
f6c00484a38a41513dc22d9cecdbca70	D_0__Bacteria;D_1__Patescibacteria;D_2__Gracilibacteria;D_3__Candidatus Peribacteria;D_4__uncultured organism;D_5__uncultured organism;D_6__uncultured organism	0.780988775	Bacteria	Patescibacteria	Gracilibacteria	Candidatus Peribacteria	uncultured organism	uncultured organism	uncultured organism
ead8e196ef9f526ea57e8f1b0bc6ef1d	D_0__Bacteria;D_1__Patescibacteria;D_2__Microgenomatia;D_3__Candidatus Woesebacteria;D_4__uncultured bacterium;D_5__uncultured bacterium;D_6__uncultured bacterium	0.99608629	Bacteria	Patescibacteria	Microgenomatia	Candidatus Woesebacteria	uncultured bacterium	uncultured bacterium	uncultured bacterium
52db5f0b44c88459c3b2e36f481a3a0b	D_0__Bacteria;D_1__Proteobacteria;D_2__Deltaproteobacteria;D_3__Bdellovibrionales;D_4__Bdellovibrionaceae;D_5__Bdellovibrio;D_6__uncultured bacterium	0.99262081	Bacteria	Proteobacteria	Deltaproteobacteria	Bdellovibrionales	Bdellovibrionaceae	Bdellovibrio	uncultured bacterium

I think you could have been thrown off as the top of your imported phyloseq objects taxonomy table belongs to things that weren't assigned past phylum.

akalichen · April 1, 2020, 8:25pm

Thanks a lot! I'd give them another shot!