How to use GTDB

Hello,

I assume you mean you want to train a classifier to use the GTDB taxonomy with a 16S rRNA reference database? If you have the GTDB representative sequences and their associated taxonomy file, then it is just a matter of ensuring they are both in the correct format and importing them into QIIME2 as FeatureData[Taxonomy] and FeatureData[Sequence] types.

For the taxonomy file, you need a headerless TSV file with the following format:

ACCESSION/ID d__bacteria;p__taxa;c_taxa;o__taxa;f__;g__;s__

Unless you're using an older version, in which case its D_0__;D_1__ etc. (you can find more in this post: Importing FeatureData[Taxonomy])

Your reference database would just be a normally formatted fasta file with the format of:

">seq_id
sequence
">seq_id2
sequence

After you have imported these files it should just be a simple case of training and using the classifier (hopefully).

EDIT: I should clarify, that the "seq_id" and "ACCESSION/ID need to be identical for each sequence and associated taxonomy, or else it wont work.

2 Likes