Greene genes and NCBI data base translability


Since QIIME uses Greene genes data base, if I come up with some detected species and would like to browse them in NCBI finder, how do I find for example Klebsiella 297311 (read by greengenes) into searchable full genome sequence on NCBI?


Welcome to the forum @microuser2 !

QIIME 2 uses ANY database that you feed it :grin:

It is by no means restricted to Greengenes — we provide pre-trained classifiers for Greengenes and SILVA on our website, but other commonly used databases that are compatible more or less “off the shelf” include GTDB, UNITE, and any other database that uses compatible file formats (FASTA and tab-delimited taxonomy files)

We even have a tutorial for using RESCRIPt (an external plugin) to compile a QIIME 2-formatted database from NCBI Genbank (this example uses the 16S RefSeqs):

But it sounds like you have already processed data that was classified (or clustered) using Greengenes, and now want to map to NCBI sequences… you could use q2-feature-classifier to perform this mapping, e.g., to BLAST to find the top hit. Alternatively, just copy and paste the sequence that you want into NCBI BLAST to find the closest match…

Good luck!


Thanks Nicholas, yes I have already used greengenes, and we made a heat map with species of interests accompanied by numbers. Now for many who wish to identify those in NCBI from our published resource it wouldn’t be possible to look bck on the sequence. Any way they could decipher greenegenes species entry into NCBI without having sequence?

yes, based on the example you gave above it sounds like you used closed-reference OTU clustering against the greengenes database, so you could grab the greengenes representative sequence that maps to that greengenes ID. Then BLAST that to find the closest match in NCBI RefSeqs.


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.