How do I import a taxonomy table? What format is required?

Zak_Song · February 12, 2022, 10:01am

Hi all,

I currently have a taxonomy pd dataframe whose first column holds the columns names of my FeatureTable[Frequency] (species), and each subsequent column in the taxonomy table holds names of higher taxa.

So how can I import this table into FeatureTable[Taxonomy]?

I have been searching for several hours and can find no documentation at all describing the format qiime2 expects for FeatureTable[Taxonomy].

Also, is there any python package that can convert the dataframe mentioned above into the required format?

timanix · February 12, 2022, 11:42am

Hello!
Your pd dataframe should contain 2 columns, Feature ID and Taxon. In Taxon column, all taxonomy levels should be merged in one string, separated by ";". When your table is ready, save it as a tsv file (your_table.to_csv("taxonomy.tsv", sep="\t"). If feature ids column in your dataframe is not an index (your rows are numbered), then you will need to drop index while writing it to tsv table (something like index= False). Then you can import it to Qiime2. In this thread you can find example commands and format.

Zak_Song · February 12, 2022, 1:57pm

Hi, thank you so much for the help!!!

When I create my FeatureTable[Frequency], I did not specify feature IDs. (in fact, how do I do this and how can I check the feature IDs in the FeatureTable[Frequency]?)

So in this case, does it mean that the feature IDs are just the column names (species) in my FeatureTable[Frequency]?

If so, should I include species in the Taxon column of the taxonomy.tsv?

Also, I found some examples online using a format like below to label the taxon level, please can you confirm whether this is correct?

k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacteriales;f__Enterobacteriaceae;g__Escherichia;s__ coli

If so, are the 'flags' (k, p, c, o, etc) special characters that the program recognize or just arbitrary?

Thank you so much for taking the time to answer the questions!!! I really appreciate it!

timanix · February 12, 2022, 2:49pm

I am a little bit confused with your table. Can you post a slice of it as an example?

I guess that you are working with already collapsed to species level data. In that case species should be a Feature IDs (if you do not have other IDs in your table). If I am right, you can also create a taxonomy file with species as IDs and also include species to the Taxon column.

Looks ok.

You can use them in this format to indicate taxonomy level (kingdom, phylum, etc). But they are not necessary.

Zak_Song · February 14, 2022, 7:50am

Hi, thank you so much for your prompt reply!!!

Here is the dataframe I used to create the FeatureTable[Frequency] artifact using the Artifact.import_data() method:

I also wonder whether the Artifact.import_data() method automatically use the Index column (named 'Subject') of the dataframe as the sample IDs?

timanix · February 14, 2022, 8:09am

Here is an example how df should be formatted to import as feature table:

So in your table should be transposed (df = df.T).
Since you want to import is a frequency, not relative abundances, I wonder why you have float values instead of integers. Are they percents? Or averages? If they are averages, you can round them to integers before import If they are percents, you should divide them by 100 to get ratios and import as FeatureTable[RelativeFrequency].

I never worked with artifact API, but you can check both variants (with feature IDs as index and as a column with numbered index) and check which will give you right table at the end.

system · March 17, 2022, 2:10pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.