Converting biom files (with taxonomic info) for import in R with Phyloseq

Hello,

I would like to use the package Phyloseq in R in order to analyse my 16S data, and therefore need to import my .biom table in R.

As the taxonomic information is not included in my filtered .biom table, I converted the feature-table.biom and taxonomy.qza files obtained in QIIME2 into .txt files, and merged them by OTU ID. This was suggested in this forum: https://github.com/joey711/phyloseq/issues/821

Then, I tried to convert the merged file (“otu_table.txt”) back into a new .biom file with the following command:

biom convert -i otu_table.txt -o new_otu_table.biom --to-hdf5 --table-type=“OTU table” --process-obs-metadata taxonomy

However, I obtain many error messages, including:

“ValueError: could not convert string to float: ‘D_0__Bacteria;D_1__Proteobacteria’”

“TypeError: Invalid value on line 2, column 1, value D_0__Bacteria;D_1__Proteobacteria”

“otu_table.txt does not appear to be a BIOM file!”

May I please ask you to help me understand and solve this problem? I would be extremely grateful for some advice!

Ultimately, I would like to get a phyloseq object that would contain
the OTU table, the sample metadata, the taxonomic information and phylogenetic tree. If I understood correctly, I can achieve that by separately importing the -biom file (to which the taxonomy has been appended), the sample metadata and a tree.nwk generated in QIIME2. Then these objects could be merge in phyloseq.

Thank you very much!

Kat

Hi @Kat,
It sounds like your merging protocol is disrupting the format expected by biom. I do not really know the answer to this specific error, since biom support questions are not covered by this forum (biom format support is on the qiime1 forum) but I may be able to offer an alternative.

See this post. Specifically, using the biom add-metadata command should do the taxonomy merging without the need to convert to tsv and round-trip back to biom format. In general, biom format files are much happier when the biom commands are used to modify files.

That would give you a biom with taxonomy merged.

I hope that helps!

2 Likes

Hi @Kat,
Maybe I could suggest to do this without .biom format:

Read files:

otu

otu_table = read.csv("otu_table.csv", sep=",", row.names=1)

matrix

otu_matrix = read.csv("otu_matrix.csv", sep=",", row.names=1)
otu_matrix = as.matrix(otu_matrix)

metadata = read.csv("sample-metadata.csv", sep=",", row.names=1)

IMPORT:

OTU = otu_table(otu_table, taxa_are_rows = TRUE)
TAX = tax_table(otu_matrix)
meta = sample_data(metadata)
phy_tree = read_tree("tree.nwk")

Merging

phyloseq_merged = phyloseq(OTU, TAX, metadata, phy_tree)

This always worked for me. Good luck!

edit: of course, instead of OTU's there will be for example hash id.

3 Likes

Hi @Nicholas_Bokulich,

Thank you very much for your answer and your advice! It is very useful information, I will try to use the biom add-metadata command.

Meanwhile, I actually figured out what my mistake was when I merged the files: I included the taxonomy as the second column of the OTU table, while the program expected it to be counts. When I positioned the taxonomy column at the end of the document, the conversion worked. Nevertheless, the method you suggest seems more elegant and safer than merging and converting back and forth.

Thank you very much again!

1 Like

Hi @Jaroslaw_Grzadziel,

Thank you very much for your answer and for having taken the time to explain me how to proceed! Importing the OTU table, metadata and taxonomy files as .csv in R and merging them there seems to work well.

May I ask if there is a way to obtain a taxonomy table as the one you showed above, with the different levels (phylum, class, order, family, genus) separated in different columns? The taxonomy file output by QIIME2 merges all names together, and it would be nice to have them separated as well.

Thank you very much again for your help!

Kat

Hi @Kat,
Glad to hear I was able to help with biom format!

After exporting your taxonomy from QIIME2, you will have a file in the format:

Feature ID	Taxon	Confidence
ae3fd1a9083ed263b3d55bf6c6572a66	k__Bacteria; p__Bacteroidetes; c__Bacteroidia; o__Bacteroidales; f__Bacteroidaceae; g__Bacteroides; s__	0.967327777
cf1b2275183b5e241f602660c32084a0	k__Bacteria; p__Firmicutes; c__Bacilli; o__Lactobacillales; f__Lactobacillaceae; g__Lactobacillus; s__ruminis	0.940705832
e3338a6d212a227ebed5fef419540db4	k__Bacteria; p__Firmicutes; c__Erysipelotrichi; o__Erysipelotrichales; f__Erysipelotrichaceae; g__[Eubacterium]; s__biforme	0.999998186

I assume you don't want the third column (confidence) and have 7 taxonomic levels (listed in first line below), so to transform to the format shown by @Jaroslaw_Grzadziel, you can type a command like this in your terminal:

echo '#Kingdom#Phylum#Class#Order#Family#Genus#Species' | tr '#' '\t' > taxonomy-table.tsv
cut -f 1-2 path-to-your-taxonomy-file.tsv | tr ';' '\t' | tail -n +2 >> taxonomy-table.tsv

It sounds like this should achieve what you need...

I hope that helps!

2 Likes

Hi @Kat,
Happy to hear that you managed to deal with the problem :slight_smile:

Personally I'm not a big fan of processing text files using terminal, so my way to separate taxonomy is to open the file in MsExcel (windows) or LibreOfficeCalc (ubuntu) and manually split, just like in this help site --> LINK.

Step by step:
1) Open file (here, LibreOfficeCalc)
Separator Options: Separated by: Tab (check) -> OK
2) Now you can just remove whole Confidence column
3) Select whole Taxon column
4) Go to: Data -> Text to Columns -> Separated by: Semicolon -> OK
5) Save the file :slight_smile:

Eventually you can add each taxon level manually: Kingdom, Phylum etc...

1 Like

Hi Nicholas,

Thank you very much for your suggestion with the code, this is a great help!

Cheers,

Kat

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.