Greene genes and NCBI data base translability

microuser2 · May 31, 2021, 10:52am

Hi,

Since QIIME uses Greene genes data base, if I come up with some detected species and would like to browse them in NCBI finder, how do I find for example Klebsiella 297311 (read by greengenes) into searchable full genome sequence on NCBI?

Thanks
Regards
Natalia

Nicholas_Bokulich · May 31, 2021, 11:02am

Welcome to the forum @microuser2 !

QIIME 2 uses ANY database that you feed it

It is by no means restricted to Greengenes — we provide pre-trained classifiers for Greengenes and SILVA on our website, but other commonly used databases that are compatible more or less "off the shelf" include GTDB, UNITE, and any other database that uses compatible file formats (FASTA and tab-delimited taxonomy files)

We even have a tutorial for using RESCRIPt (an external plugin) to compile a QIIME 2-formatted database from NCBI Genbank (this example uses the 16S RefSeqs):
https://forum.qiime2.org/t/using-rescript-to-compile-sequence-databases-and-taxonomy-classifiers-from-ncbi-genbank/15947/10

But it sounds like you have already processed data that was classified (or clustered) using Greengenes, and now want to map to NCBI sequences... you could use q2-feature-classifier to perform this mapping, e.g., to BLAST to find the top hit. Alternatively, just copy and paste the sequence that you want into NCBI BLAST to find the closest match...

Good luck!

microuser2 · May 31, 2021, 1:40pm

Thanks Nicholas, yes I have already used greengenes, and we made a heat map with species of interests accompanied by numbers. Now for many who wish to identify those in NCBI from our published resource it wouldn't be possible to look bck on the sequence. Any way they could decipher greenegenes species entry into NCBI without having sequence?
Regards
Natalia

Nicholas_Bokulich · May 31, 2021, 1:43pm

yes, based on the example you gave above it sounds like you used closed-reference OTU clustering against the greengenes database, so you could grab the greengenes representative sequence that maps to that greengenes ID. Then BLAST that to find the closest match in NCBI RefSeqs.