Creating feature table from taxonomical data in Excel format

Hi,

I have some taxonomical data in the format of Excel files (attached image).


I don't know how to convert them to Feature Table in order to do further analysis, I have about 20 Excel files, each file contain data for one sample. I want to generate the taxa data in each level and do PCoA and other ordination plots. I would be grateful if you guys can help.

Many thanks,
Allie.!

Hello Allie,

Great question! There are a couple of ways to do this, and some are easier than others. I think this would be the best option:

  • Start from a previous data file.
    Most data analysis programs make a single file that has all the taxonomy from all of the samples, so I’m guessing that someone took that combined file, separated it into 20 files for all 20 samples, then sent these 20 files to you. If we can find that single file, we can figure out how to import it into Qiime 2, and it will already include all taxonomy and all your samples. One (maybe easy!) step.
    This file could be an .biom file or a OTU.txt file.

If you can find this .biom file, we can import it into Qiime 2 as Feature Table Data.

Let me know what you can find!

Colin

3 Likes

Hi Colin,

Unfortunately I’m not able to find the previous data file, these single files are all I have.

Allie.

Thank you for the update Allie!

We have some other options, but these take some more work.

Can you tell me a little bit more about the excel files you have? Do they have “summary” in their name? Do they end in .xlsx or .csv or .tsv? I’m asking because I think these may be coming from the summarize_taxa.py script from Qiime 1, and that scripts merges OTUs together if they share a taxonomy assignment. Usually PCoA plots of beta diversity use the original OTUs, but it should be possible to do the same thing with the merged taxonomic feature.


Maybe you could try this: merge all your excel files together, so they look like this:

kingdom phylum class order family genus species Sample1 Sample2
Archaea Euryarchaeota Methanobacteria Methanobacteriales Methanobacteriaceae Methanobrevibacter Methanobrevibacter smithii 123 456

Basically, I’ve taken the numb_hits column from each sample, and made that into a new column called Sample1, Sample2, Sample3… for each of your samples. Then all your samples are in one file!

Does that make sense? Do you know how to match up the rows so that the same taxonomy is on each line?

Colin

P.S. Maybe @thermokarst or @Nicholas_Bokulich have a good suggestion about how to do this. Let’s see if they have any suggestions about how to merge these files!

1 Like

Hi Colin,

These files names are in this format “SampleID.summary.csv”. They were not generated in QIIME 1, but from another software ([email protected]). They are from two different sequencing batchs (1 year gap between) and I think the database used for taxonomical classification is slightly different (for example same phylum but different names: Deinococcus-Thermus in the 1st batch and Thermi in the 2nd batch). I was thinking about putting all files into one file, but I don’t know how to match up the rows so that the same taxonomy is on each line, plus the abovementioned problem of difference. I would be grateful if you can give me some instructions.

Allie.

To solve the merging issue, I think import each file separately and then merge inside QIIME 2.

However, the format is really goofy and you will need to do some re-formatting prior to importing. This would be easiest with a bash script, but you can also concatenate cells in excel (in a new cell you would do something like “=concatenate(A$1, ‘;’, B$1, ‘:’, C$1, ‘;’, D$1, ‘;’, E$1, ‘:’, F$1, ‘;’, G$1)” and copy down the column). This is because at the end of the day you will want to have a text file in a format that you can convert to biom-format.

  1. save as a text file
  2. drop either the num_hits or %_hits column so that you are merging consistent data types.
  3. taxon names should be in a single string, not separate columns
    (steps 1-3 could be in any order if you prefer to format in excel or as a text file)
  4. convert to biom
  5. import as a FeatureTable[Frequency] in QIIME 2
  6. use feature-table merge to merge all tables together.

I hope that helps! Good luck with reformatting that file…

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.