Importing Taxonomy Tables from dada2

ChristianEdwardson · March 29, 2018, 11:16pm

Is it possible to import a dada2 (and/or phyloseq) format taxonomy table into qiime2 directly?

In this case, each level of the taxonomy is tab-separated, rather than one column for taxonomic assignment.

colinbrislawn · March 29, 2018, 11:37pm

Hello Christian,

Directly? Nope. Taxonomy is never even mentioned on the importing page, presumably because taxonomy is already inside the .biom file that being imported.

But importing from a otu_table_w_tax.biom should work great. The main idea is to add the taxonomy to a .biom file like this, then import that .biom file into Qiime 2.

Sounds like your taxonomy table is not quite in the format that Qiime wants (too many tabs!)

If you could post the first few lines of it, I can build a sed command to convert it into the right format.

Colin

thermokarst · April 2, 2018, 6:48pm

Thanks @colinbrislawn! Just to clarify, the Importing guide you link to is not an exhaustive guide to importing, there are many (dozens?) of formats not covered in the docs anywhere just yet. There are, however, many examples of importing taxonomy floating around here on the forum! With that said, @ChristianEdwardson - the taxonomy format in QIIME 2 expects the entire taxon value to be encoded in one single column (semicolon delimited between taxonomic levels). Additional columns are supported, but aren't parsed as taxonomic information (a good example is feature classification confidence). As well, @colinbrislawn is discussing a feature table with taxonomic annotations, but I understood from your initial post that this isn't a feature table (samples by features), but if I misunderstood please let us know!. To second @colinbrislawn's request - a few lines of the file wouldn't hurt! Thanks!

ChristianEdwardson · April 6, 2018, 9:08pm

This may be of limited interest or value to others, as I'm assuming most people will run the dada2 pipeline within QIIME2, but say I already have my data processed in R and I want to use the QIIME 2 downstream analyses, I need a taxonomy table (FeatureData[Taxonomy]). The taxonomy table produced by dada2 in R is a matrix with rownames = exact sequence (feature), and one column for each taxonomy level, Kingdom to Genus. I can export the matrix however I want - for example I can create a semicolon-separated file or a tab-separated file, but then I would have to change the first instance of a semicolon to a tab or with sed or awk. My question was more along the lines of if QIIME2 would be able to parse taxonomy data in multiple formats?

taxa_example.txt (2.1 KB)

thermokarst · April 10, 2018, 3:31pm

Ah, sorry for misunderstanding! While not specific to this particular format, we do support importing multiple formats for any particular Semantic Type of data - for example, check out the Importing Feature Table Data tutorial - we have multiple formats specified in QIIME 2 - one for the BIOM v1.0.0 format, and one for the BIOM v2.1.0 format --- both can be imported as the same type, there is just some additional machinery defined in the format and/or transformers to do the conversions along the way (stay tuned for developer docs, I would've linked to them here if I had some to share!). Hope that answers your question. Thanks!

ChristianEdwardson · April 17, 2018, 11:56pm

Thanks @colinbrislawn and @thermokarst for the suggestions, as well as this post.

I have figured out how to get the taxonomy from dada2/phyloseq in the correct format to be added to both biom files correctly and qiime2. Posting the code here for anyone else that might be having troubles.
One note: I needed to install the latest github version of biomformat for R (version 1.7.0)

Starting in R with a phyloseq object (ps):

Format the taxonomy table:

tax<-as(tax_table(ps),"matrix")
tax_cols <- c("Kingdom", "Phylum", "Class","Order","Family","Genus")
tax<-as.data.frame(tax)
tax$taxonomy<-do.call(paste, c(tax[tax_cols], sep=";"))
for(co in tax_cols) tax[co]<-NULL
write.table(tax, "tax.txt", quote=FALSE, col.names=FALSE, sep="\t")

Make an biomformat OTU table:

otu<-t(as(otu_table(ps),"matrix"))
otu_biom<-make_biom(data=otu)
write_biom(otu_biom,"otu_biom.biom")

Optional: In qiime2 to make a biom OTU table with taxonomy:
(this step is not necessary for qiime2, as the taxonomy table is not imported from the biom file)

biom convert -i otu_biom.biom -o otu_biom_HDF5.biom --to-hdf5
biom add-metadata -i otu_biom_HDF5.biom -o otu_wTax.biom --observation-metadata-fp tax.txt --observation-header OTUID,taxonomy --sc-separated taxonomy

Import the OTU table (feature table) to qiime2:

qiime tools import \
--input-path otu_wTax.biom \
--type 'FeatureTable[Frequency]' \
--source-format BIOMV210Format \
--output-path feature-table.qza

Finally, import the taxonomy table:

qiime tools import \
--type 'FeatureData[Taxonomy]' \
--source-format HeaderlessTSVTaxonomyFormat \
--input-path tax.txt \
--output-path taxonomy.qza

thermokarst · April 18, 2018, 12:24am

This is great @ChristianEdwardson - would you be up for moving this over to Community Tutorials? I suspect this will come in handy for many others! Thanks!

ChristianEdwardson · April 27, 2018, 10:15pm

This could definitely go over to the Community Tutorials. Do I just need to copy this to a new topic over there?

colinbrislawn · April 28, 2018, 3:50am

Good afternoon,

I would love to see this as a community tutorial. Then we could point to your rock-solid example, without having folks track down all the pages you had to find.

While you could just copy/paste these commands, a little more legwork will make this tutorial much more useful for future users. Here's what is suggested:

So combining the strengths of your post with @Jaroslaw_Grzadziel's post is a perfect start. Adding a toy data set (or a small but real data set!) with an intro paragraph just to set the scene would make for a very good tutorial.

Colin
P.S. The canonical biom-format package should have an R package that just works. Installing github software in R requires lots of dependencies, so maybe while you build this tutorial, we can get that up and running.

PPS. Thanks.

PPPS. #openscience

system · May 29, 2018, 9:50am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.