Importing dada2 and Phyloseq objects to QIIME 2


(Christian Edwardson) #1

Importing dada2 and/or Phyloseq objects to QIIME 2

Background

This tutorial describes how to take feature/OTU tables, taxonomy tables, and sample data (metadata) from R and import into QIIME 2. This might be useful if you have already completed analyses in R using (but probably not limited to) the dada2 and phyloseq packages and you want to add or compare to data analyzed in QIIME 2. You may also want to use or share data with the nice interactive visualization features of QIIME 2. This will help you get your data in the correct format to import into QIIME2.

This tutorial was motivated by discussions in the forum here and here.

If you’d like to do the reverse steps (QIIME2 --> phyloseq) check out this tutorial.

Update (6/22/18): I’ve generalized the phyloseq exporting steps into an R function, available here.

Example Data

In R

library(dada2); packageVersion("dada2")
## [1] ‘1.6.0’
library(phyloseq); packageVersion("phyloseq")
## [1] ‘1.23.1’

Use the package examples in the dada2 package (?dada, ?makeSequenceTable, ?assignTaxonomy) and dada2 tutorial to make an example dada2 seqtab, assign taxonomy, create a “dummy” metadata data.frame and create a phyloseq object.

derep1 <- derepFastq(system.file("extdata", "sam1F.fastq.gz", package="dada2"))
derep2 <- derepFastq(system.file("extdata", "sam2F.fastq.gz", package="dada2"))
dada1 <- dada(derep1, tperr1)
dada2 <- dada(derep2, tperr1)
seqtab<-makeSequenceTable(list(sample1=dada1, sample2=dada2))

training_fasta <- system.file("extdata", "example_train_set.fa.gz", package="dada2")
taxa <- assignTaxonomy(seqtab, training_fasta)

samples.out <- rownames(seqtab)
subject <- sapply(strsplit(samples.out, "D"), `[`, 1)
gender <- substr(subject,1,1)
subject <- substr(subject,2,999)
day <- as.integer(sapply(strsplit(samples.out, "D"), `[`, 2))
samdf <- data.frame(Subject=subject, Gender=gender, Day=day)
rownames(samdf) <- samples.out

ps <- phyloseq(otu_table(seqtab, taxa_are_rows=FALSE), 
               sample_data(samdf), 
               tax_table(taxa))

Alternatively, if you just want to start with a phyloseq object you could use any of the example data included with the package.

data(GlobalPatterns)
data(esophagus)
data(enterotype)
data(soilrep)

Note that in the phyloseq example data, taxa_are_rows=TRUE, whereas in the dada2 seqtab, taxa_are_rows=FALSE. This will be important later.

Prepare and Export Taxonomy, OTU Table, and Metadata

# Export taxonomy table as "tax.txt"

tax<-as(tax_table(ps),"matrix")
tax_cols <- colnames(tax)
tax<-as.data.frame(tax)
tax$taxonomy<-do.call(paste, c(tax[tax_cols], sep=";"))
for(co in tax_cols) tax[co]<-NULL
write.table(tax, "tax.txt", quote=FALSE, col.names=FALSE, sep="\t")

# Export feature/OTU table

# As a biom file

library(biomformat);packageVersion("biomformat")
## [1] ‘1.6.0’

otu<-t(as(otu_table(ps),"matrix")) # 't' to transform if taxa_are_rows=FALSE
#if taxa_are_rows=TRUE
#otu<-as(otu_table(GlobalPatterns),"matrix"))
otu_biom<-make_biom(data=otu)
write_biom(otu_biom,"otu_biom.biom")

# As a text file

write.table(t(seqtab), "seqtab.txt", sep="\t", row.names=TRUE, col.names=NA, quote=FALSE)
#or from the phyloseq object, 't' to transform if taxa_are_rows=FALSE, no 't' if taxa_are_rows=TRUE
#write.table(t(otu_table(ps), "seqtab.txt",sep="\t", row.names=TRUE, col.names=NA, quote=FALSE)

# Export metadata (if you have a properly formatted metadata file that you imported in your phyloseq pipeline, you can skip this step and just use that text file directly in QIIME 2)

write.table(sample_data(ps),"sample-metadata.txt", sep="\t", row.names=FALSE, col.names=TRUE, quote=FALSE)

In QIIME 2 (qiime2-2018.4)

Import feature table from exported biom

qiime tools import \
--input-path otu_biom.biom \
--type 'FeatureTable[Frequency]' \
--source-format BIOMV100Format \
--output-path feature-table.qza

Import feature table from text file:

echo -n "#OTU Table" | cat - seqtab.txt > seqtab-biom-table.txt

biom convert -i seqtab-biom-table.txt -o  seqtab-biom-table.biom --table-type="OTU table" --to-hdf5

qiime tools import \
--input-path  seqtab-biom-table.biom \
--type 'FeatureTable[Frequency]' \
--source-format BIOMV210Format \
--output-path feature-table2.qza

Import the taxonomy table:

qiime tools import \
--type 'FeatureData[Taxonomy]' \
--source-format HeaderlessTSVTaxonomyFormat \
--input-path tax.txt \
--output-path taxonomy.qza

Bonus: Export and Representative Sequences from dada2:

In R:

uniquesToFasta(seqtab, fout='rep-seqs.fna', ids=colnames(seqtab))

In QIIME 2:

qiime tools import \
--input-path rep-seqs.fna \
--type 'FeatureData[Sequence]' \
--output-path rep-seqs.qza


Subsampling issue
Phyloseq to QIIME2
Phyloseq to QIIME2
(Christian Edwardson) #2

As of 6/21/2018, in Ubuntu 16.04, the latest R that I can install (without adding a PPA) is 3.4.4 and the version included with qiime2 is 3.4.1. Therefore, the newest version of Bioconductor and thus the newest version of dada2, phyloseq and biomformat aren’t installed.

At this time, with biomformat 1.6, the following error occurs:

qiime tools import \
--input-path otu_biom.biom \
--type 'FeatureTable[Frequency]' \
--source-format BIOMV100Format \
--output-path feature-table.qza

Traceback (most recent call last):
  File "/home/christian/miniconda3/envs/qiime2-2018.4/lib/python3.5/site-packages/q2cli/tools.py", line 116, in import_data
    view_type=source_format)
  File "/home/christian/miniconda3/envs/qiime2-2018.4/lib/python3.5/site-packages/qiime2/sdk/result.py", line 218, in import_data
    return cls._from_view(type_, view, view_type, provenance_capture)
  File "/home/christian/miniconda3/envs/qiime2-2018.4/lib/python3.5/site-packages/qiime2/sdk/result.py", line 243, in _from_view
    result = transformation(view)
  File "/home/christian/miniconda3/envs/qiime2-2018.4/lib/python3.5/site-packages/qiime2/core/transform.py", line 70, in transformation
    new_view = transformer(view)
  File "/home/christian/miniconda3/envs/qiime2-2018.4/lib/python3.5/site-packages/qiime2/core/transform.py", line 220, in wrapped
    file_view = transformer(view)
  File "/home/christian/miniconda3/envs/qiime2-2018.4/lib/python3.5/site-packages/q2_types/feature_table/_transformer.py", line 128, in _8
    data = _parse_biom_table_v100(ff)
  File "/home/christian/miniconda3/envs/qiime2-2018.4/lib/python3.5/site-packages/q2_types/feature_table/_transformer.py", line 46, in _parse_biom_table_v100
    table = biom.Table.from_json(json.load(fh))
  File "/home/christian/miniconda3/envs/qiime2-2018.4/lib/python3.5/site-packages/biom_format-2.1.6-py3.5-linux-x86_64.egg/biom/table.py", line 4190, in from_json
    dtype = MATRIX_ELEMENT_TYPE[json_table['matrix_element_type']]
TypeError: unhashable type: 'list'

An unexpected error has occurred:

  unhashable type: 'list'

See above for debug info. 

I have verified that this does not occur in Windows with R 3.5 and up-to-date Bioconductor packages (i.e. biomformat 1.7). See here.


(Jordan Bisanz) #3

What do you think about implementing this functionality into qime2R before it goes into bioconductor? The function could be something like write_qza and take the input and a desired format, perhaps with some ability to guess the desired qza type based on class, dimensions, and column/row sums. Using a system call to an existing qiime2 install, the qza could be produced behind the scenes using qiime tools import. While the qza directory/file structure could be replicated without requiring qiime2, it would be easier and avoid the creation of artifacts with a sketchy/unexpected provenance for subsequent plugins.

The usage could be something like:

write_qza(seqtab, qza="~/MyProject/seqtab.qza", type="FeatureTable")

(Christian Edwardson) #4

This sounds great, and I was actually hoping to extend my phyloseq2qiime2.R script to include the qiime2 import steps but don’t currently have the time or resources to work on this.


(Christian Edwardson) #5

Updated format for newer versions of QIIME 2 (2018.8 and later): changed --source-format to --input-format

Import feature table from exported biom:

qiime tools import \
--input-path otu_biom.biom \
--type 'FeatureTable[Frequency]' \
--input-format BIOMV100Format \
--output-path feature-table.qza

Import feature table from text file

echo -n "#OTU Table" | cat - seqtab.txt > seqtab-biom-table.txt 

biom convert -i seqtab-biom-table.txt -o seqtab-biom-table.biom --table-type="OTU table" --to-hdf5

qiime tools import \
        --input-path  seqtab-biom-table.biom \
        --type 'FeatureTable[Frequency]' \
        --input-format BIOMV210Format \
        --output-path feature-table2.qza

Import the taxonomy table:

qiime tools import \
--type 'FeatureData[Taxonomy]' \
--input-format HeaderlessTSVTaxonomyFormat \
--input-path tax.txt \
--output-path taxonomy.qza