Defining taxonomy for import into FeatureData[Taxonomy]

danwiththeplan · June 14, 2018, 11:48pm

Hi all. I am working on a QIIME2 analysis involving COI and therefore require a file that can be imported into a FeatureData[Taxonomy] artefact as described here:

My starting point is basically a bunch of genus names that I can parse out of the header of the associated fasta file. That is no problem. However I was wondering if there is a simple and quick way of defining the full taxonomy, and producing a file that can be imported into a FeatureData[Taxonomy], for example using Entrez Direct.

I can do this myself given time, but I was thinking this might be a problem that's already been solved. Also perl gives me a splitting headache.

Nicholas_Bokulich · June 15, 2018, 2:24pm

Hi @danwiththeplan,

There might be a way to query entrez directly, as you suggest — but I am not sure.

Maybe this repository would help. The autoannotate.py script in there does what you are asking for — converting genus names to full taxonomy strings — provided you already have a file containing the full taxonomy strings in the correct format.

I put that code together to automatically format expected taxonomy files (to be imported as FeatureData[Taxonomy]) in mockrobiota from a list of genus names — but already had SILVA/Greengenes databases from which to pull the full taxonomy strings.

So if you already have a separate taxonomy map to use as a template, this will help. If not, maybe someone else knows of a way to do this with entrez direct.

Let me know if that helps!

danwiththeplan · June 19, 2018, 12:48am

Much appreciated. And it's python, yay.