Importing dada2 and/or Phyloseq objects to QIIME 2
Background
This tutorial describes how to take feature/OTU tables, taxonomy tables, and sample data (metadata) from R and import into QIIME 2. This might be useful if you have already completed analyses in R using (but probably not limited to) the dada2 and phyloseq packages and you want to add or compare to data analyzed in QIIME 2. You may also want to use or share data with the nice interactive visualization features of QIIME 2. This will help you get your data in the correct format to import into QIIME2.
This tutorial was motivated by discussions in the forum here and here.
If you'd like to do the reverse steps (QIIME2 --> phyloseq) check out this tutorial.
Update (6/22/18): I've generalized the phyloseq exporting steps into an R function, available here.
Example Data
In R
library(dada2); packageVersion("dada2")
## [1] β1.6.0β
library(phyloseq); packageVersion("phyloseq")
## [1] β1.23.1β
Use the package examples in the dada2 package (?dada, ?makeSequenceTable, ?assignTaxonomy) and dada2 tutorial to make an example dada2 seqtab, assign taxonomy, create a "dummy" metadata data.frame and create a phyloseq object.
derep1 <- derepFastq(system.file("extdata", "sam1F.fastq.gz", package="dada2"))
derep2 <- derepFastq(system.file("extdata", "sam2F.fastq.gz", package="dada2"))
dada1 <- dada(derep1, tperr1)
dada2 <- dada(derep2, tperr1)
seqtab<-makeSequenceTable(list(sample1=dada1, sample2=dada2))
training_fasta <- system.file("extdata", "example_train_set.fa.gz", package="dada2")
taxa <- assignTaxonomy(seqtab, training_fasta)
samples.out <- rownames(seqtab)
subject <- sapply(strsplit(samples.out, "D"), `[`, 1)
gender <- substr(subject,1,1)
subject <- substr(subject,2,999)
day <- as.integer(sapply(strsplit(samples.out, "D"), `[`, 2))
samdf <- data.frame(Subject=subject, Gender=gender, Day=day)
rownames(samdf) <- samples.out
ps <- phyloseq(otu_table(seqtab, taxa_are_rows=FALSE),
sample_data(samdf),
tax_table(taxa))
Alternatively, if you just want to start with a phyloseq object you could use any of the example data included with the package.
data(GlobalPatterns)
data(esophagus)
data(enterotype)
data(soilrep)
Note that in the phyloseq example data, taxa_are_rows=TRUE, whereas in the dada2 seqtab, taxa_are_rows=FALSE. This will be important later.
Prepare and Export Taxonomy, OTU Table, and Metadata
# Export taxonomy table as "tax.txt"
tax<-as(tax_table(ps),"matrix")
tax_cols <- colnames(tax)
tax<-as.data.frame(tax)
tax$taxonomy<-do.call(paste, c(tax[tax_cols], sep=";"))
for(co in tax_cols) tax[co]<-NULL
write.table(tax, "tax.txt", quote=FALSE, col.names=FALSE, sep="\t")
# Export feature/OTU table
# As a biom file
library(biomformat);packageVersion("biomformat")
## [1] β1.6.0β
otu<-t(as(otu_table(ps),"matrix")) # 't' to transform if taxa_are_rows=FALSE
#if taxa_are_rows=TRUE
#otu<-as(otu_table(GlobalPatterns),"matrix"))
otu_biom<-make_biom(data=otu)
write_biom(otu_biom,"otu_biom.biom")
# As a text file
write.table(t(seqtab), "seqtab.txt", sep="\t", row.names=TRUE, col.names=NA, quote=FALSE)
#or from the phyloseq object, 't' to transform if taxa_are_rows=FALSE, no 't' if taxa_are_rows=TRUE
#write.table(t(otu_table(ps), "seqtab.txt",sep="\t", row.names=TRUE, col.names=NA, quote=FALSE)
# Export metadata (if you have a properly formatted metadata file that you imported in your phyloseq pipeline, you can skip this step and just use that text file directly in QIIME 2)
write.table(sample_data(ps),"sample-metadata.txt", sep="\t", row.names=FALSE, col.names=TRUE, quote=FALSE)
In QIIME 2 (qiime2-2018.4)
Import feature table from exported biom
qiime tools import \
--input-path otu_biom.biom \
--type 'FeatureTable[Frequency]' \
--source-format BIOMV100Format \
--output-path feature-table.qza
Import feature table from text file:
echo -n "#OTU Table" | cat - seqtab.txt > seqtab-biom-table.txt
biom convert -i seqtab-biom-table.txt -o seqtab-biom-table.biom --table-type="OTU table" --to-hdf5
qiime tools import \
--input-path seqtab-biom-table.biom \
--type 'FeatureTable[Frequency]' \
--source-format BIOMV210Format \
--output-path feature-table2.qza
Import the taxonomy table:
qiime tools import \
--type 'FeatureData[Taxonomy]' \
--source-format HeaderlessTSVTaxonomyFormat \
--input-path tax.txt \
--output-path taxonomy.qza
Bonus: Export and Representative Sequences from dada2:
In R:
uniquesToFasta(seqtab, fout='rep-seqs.fna', ids=colnames(seqtab))
In QIIME 2:
qiime tools import \
--input-path rep-seqs.fna \
--type 'FeatureData[Sequence]' \
--output-path rep-seqs.qza