Importing dada2 and Phyloseq objects to QIIME 2

Importing dada2 and/or Phyloseq objects to QIIME 2

Background

This tutorial describes how to take feature/OTU tables, taxonomy tables, and sample data (metadata) from R and import into QIIME 2. This might be useful if you have already completed analyses in R using (but probably not limited to) the dada2 and phyloseq packages and you want to add or compare to data analyzed in QIIME 2. You may also want to use or share data with the nice interactive visualization features of QIIME 2. This will help you get your data in the correct format to import into QIIME2.

This tutorial was motivated by discussions in the forum here and here.

If you'd like to do the reverse steps (QIIME2 --> phyloseq) check out this tutorial.

Update (6/22/18): I've generalized the phyloseq exporting steps into an R function, available here.

Example Data

In R

library(dada2); packageVersion("dada2")
## [1] β€˜1.6.0’
library(phyloseq); packageVersion("phyloseq")
## [1] β€˜1.23.1’

Use the package examples in the dada2 package (?dada, ?makeSequenceTable, ?assignTaxonomy) and dada2 tutorial to make an example dada2 seqtab, assign taxonomy, create a "dummy" metadata data.frame and create a phyloseq object.

derep1 <- derepFastq(system.file("extdata", "sam1F.fastq.gz", package="dada2"))
derep2 <- derepFastq(system.file("extdata", "sam2F.fastq.gz", package="dada2"))
dada1 <- dada(derep1, tperr1)
dada2 <- dada(derep2, tperr1)
seqtab<-makeSequenceTable(list(sample1=dada1, sample2=dada2))

training_fasta <- system.file("extdata", "example_train_set.fa.gz", package="dada2")
taxa <- assignTaxonomy(seqtab, training_fasta)

samples.out <- rownames(seqtab)
subject <- sapply(strsplit(samples.out, "D"), `[`, 1)
gender <- substr(subject,1,1)
subject <- substr(subject,2,999)
day <- as.integer(sapply(strsplit(samples.out, "D"), `[`, 2))
samdf <- data.frame(Subject=subject, Gender=gender, Day=day)
rownames(samdf) <- samples.out

ps <- phyloseq(otu_table(seqtab, taxa_are_rows=FALSE), 
               sample_data(samdf), 
               tax_table(taxa))

Alternatively, if you just want to start with a phyloseq object you could use any of the example data included with the package.

data(GlobalPatterns)
data(esophagus)
data(enterotype)
data(soilrep)

Note that in the phyloseq example data, taxa_are_rows=TRUE, whereas in the dada2 seqtab, taxa_are_rows=FALSE. This will be important later.

Prepare and Export Taxonomy, OTU Table, and Metadata

# Export taxonomy table as "tax.txt"

tax<-as(tax_table(ps),"matrix")
tax_cols <- colnames(tax)
tax<-as.data.frame(tax)
tax$taxonomy<-do.call(paste, c(tax[tax_cols], sep=";"))
for(co in tax_cols) tax[co]<-NULL
write.table(tax, "tax.txt", quote=FALSE, col.names=FALSE, sep="\t")

# Export feature/OTU table

# As a biom file

library(biomformat);packageVersion("biomformat")
## [1] β€˜1.6.0’

otu<-t(as(otu_table(ps),"matrix")) # 't' to transform if taxa_are_rows=FALSE
#if taxa_are_rows=TRUE
#otu<-as(otu_table(GlobalPatterns),"matrix"))
otu_biom<-make_biom(data=otu)
write_biom(otu_biom,"otu_biom.biom")

# As a text file

write.table(t(seqtab), "seqtab.txt", sep="\t", row.names=TRUE, col.names=NA, quote=FALSE)
#or from the phyloseq object, 't' to transform if taxa_are_rows=FALSE, no 't' if taxa_are_rows=TRUE
#write.table(t(otu_table(ps), "seqtab.txt",sep="\t", row.names=TRUE, col.names=NA, quote=FALSE)

# Export metadata (if you have a properly formatted metadata file that you imported in your phyloseq pipeline, you can skip this step and just use that text file directly in QIIME 2)

write.table(sample_data(ps),"sample-metadata.txt", sep="\t", row.names=FALSE, col.names=TRUE, quote=FALSE)

In QIIME 2 (qiime2-2018.4)

Import feature table from exported biom

qiime tools import \
--input-path otu_biom.biom \
--type 'FeatureTable[Frequency]' \
--source-format BIOMV100Format \
--output-path feature-table.qza

Import feature table from text file:

echo -n "#OTU Table" | cat - seqtab.txt > seqtab-biom-table.txt

biom convert -i seqtab-biom-table.txt -o  seqtab-biom-table.biom --table-type="OTU table" --to-hdf5

qiime tools import \
--input-path  seqtab-biom-table.biom \
--type 'FeatureTable[Frequency]' \
--source-format BIOMV210Format \
--output-path feature-table2.qza

Import the taxonomy table:

qiime tools import \
--type 'FeatureData[Taxonomy]' \
--source-format HeaderlessTSVTaxonomyFormat \
--input-path tax.txt \
--output-path taxonomy.qza

Bonus: Export and Representative Sequences from dada2:

In R:

uniquesToFasta(seqtab, fout='rep-seqs.fna', ids=colnames(seqtab))

In QIIME 2:

qiime tools import \
--input-path rep-seqs.fna \
--type 'FeatureData[Sequence]' \
--output-path rep-seqs.qza

6 Likes

As of 6/21/2018, in Ubuntu 16.04, the latest R that I can install (without adding a PPA) is 3.4.4 and the version included with qiime2 is 3.4.1. Therefore, the newest version of Bioconductor and thus the newest version of dada2, phyloseq and biomformat aren’t installed.

At this time, with biomformat 1.6, the following error occurs:

qiime tools import \
--input-path otu_biom.biom \
--type 'FeatureTable[Frequency]' \
--source-format BIOMV100Format \
--output-path feature-table.qza

Traceback (most recent call last):
  File "/home/christian/miniconda3/envs/qiime2-2018.4/lib/python3.5/site-packages/q2cli/tools.py", line 116, in import_data
    view_type=source_format)
  File "/home/christian/miniconda3/envs/qiime2-2018.4/lib/python3.5/site-packages/qiime2/sdk/result.py", line 218, in import_data
    return cls._from_view(type_, view, view_type, provenance_capture)
  File "/home/christian/miniconda3/envs/qiime2-2018.4/lib/python3.5/site-packages/qiime2/sdk/result.py", line 243, in _from_view
    result = transformation(view)
  File "/home/christian/miniconda3/envs/qiime2-2018.4/lib/python3.5/site-packages/qiime2/core/transform.py", line 70, in transformation
    new_view = transformer(view)
  File "/home/christian/miniconda3/envs/qiime2-2018.4/lib/python3.5/site-packages/qiime2/core/transform.py", line 220, in wrapped
    file_view = transformer(view)
  File "/home/christian/miniconda3/envs/qiime2-2018.4/lib/python3.5/site-packages/q2_types/feature_table/_transformer.py", line 128, in _8
    data = _parse_biom_table_v100(ff)
  File "/home/christian/miniconda3/envs/qiime2-2018.4/lib/python3.5/site-packages/q2_types/feature_table/_transformer.py", line 46, in _parse_biom_table_v100
    table = biom.Table.from_json(json.load(fh))
  File "/home/christian/miniconda3/envs/qiime2-2018.4/lib/python3.5/site-packages/biom_format-2.1.6-py3.5-linux-x86_64.egg/biom/table.py", line 4190, in from_json
    dtype = MATRIX_ELEMENT_TYPE[json_table['matrix_element_type']]
TypeError: unhashable type: 'list'

An unexpected error has occurred:

  unhashable type: 'list'

See above for debug info. 

I have verified that this does not occur in Windows with R 3.5 and up-to-date Bioconductor packages (i.e. biomformat 1.7). See here.

What do you think about implementing this functionality into qime2R before it goes into bioconductor? The function could be something like write_qza and take the input and a desired format, perhaps with some ability to guess the desired qza type based on class, dimensions, and column/row sums. Using a system call to an existing qiime2 install, the qza could be produced behind the scenes using qiime tools import. While the qza directory/file structure could be replicated without requiring qiime2, it would be easier and avoid the creation of artifacts with a sketchy/unexpected provenance for subsequent plugins.

The usage could be something like:

write_qza(seqtab, qza="~/MyProject/seqtab.qza", type="FeatureTable")
1 Like

This sounds great, and I was actually hoping to extend my phyloseq2qiime2.R script to include the qiime2 import steps but don’t currently have the time or resources to work on this.

Updated format for newer versions of QIIME 2 (2018.8 and later): changed --source-format to --input-format

Import feature table from exported biom:

qiime tools import \
--input-path otu_biom.biom \
--type 'FeatureTable[Frequency]' \
--input-format BIOMV100Format \
--output-path feature-table.qza

Import feature table from text file

echo -n "#OTU Table" | cat - seqtab.txt > seqtab-biom-table.txt 

biom convert -i seqtab-biom-table.txt -o seqtab-biom-table.biom --table-type="OTU table" --to-hdf5

qiime tools import \
        --input-path  seqtab-biom-table.biom \
        --type 'FeatureTable[Frequency]' \
        --input-format BIOMV210Format \
        --output-path feature-table2.qza

Import the taxonomy table:

qiime tools import \
--type 'FeatureData[Taxonomy]' \
--input-format HeaderlessTSVTaxonomyFormat \
--input-path tax.txt \
--output-path taxonomy.qza
4 Likes

Hello! Beginner here, very grateful for the guidance! I followed this guide step-by-step and everything worked perfectly! I am just a bit confused on when/how to upload my sample metadata to Qiime2? I may have missed something but this guide does not appear to cover that. Just as described here, I now have a feature table and a taxonomy qza loaded into qiime2 and have a sample-metadata.txt tab delimited file with no rownames (so no sample names) ready to go.

Hi @Jadelys_Miosotis_Ton,
Welcome to the QIIME 2 Forum!

In QIIME 2, you load your metadata when you're using it - i.e., when using a command that requires metadata. QIIME 2 takes metadata as .tsv, so unlike the feature table (for example) there isn't an import required. This page provides information on the expected format of the sample metadata.

qiime feature-table summarize is an example of a command that optionally accepts sample metadata - you can review the linked help text for that command to see how that works, and you should be able to call that with your metadata and the feature table that you imported, once the metadata is formatted according to the description in that link.

Thank you! I tested my formatting using qiime feature-table summarize and it all seemed to work well!

1 Like