reference databases: SILVA vs. Greengenes vs. GTDB

Ellenphant · May 4, 2020, 5:19pm

Sorry just another question that has stemmed from this!

I know in the past that SILVA was viewed as being a better option than others (such as GreenGenes) but what about ones such as GTDB? I imagine the database choice is going to largely influence the classified taxonomy output so want to make sure that I make educated choices!

Nicholas_Bokulich · May 4, 2020, 7:16pm

Hi @Ellenphant,
GTDB works well with QIIME 2, the files are already in an appropriate format (though I've noticed the reads are in mixed orientations so best results are obtained when using extract-reads prior to training your classifier, since this fixes read orientation). I have been using that database a lot recently and would recommend it, but you should be aware that many of the taxonomies used in there are not recognized names (see the GTDB website and paper for more details on this).

This is mostly because SILVA is more recently and regularly updated than greengenes, but many people do still use the latter because it has its own advantages.

As for which database is best, they all have their own strengths and weaknesses. You should just be consistent in what you use if you plan to compare across studies, and you can see the references for each of these databases to evaluate what best fits your biological question.

jwdebelius · May 4, 2020, 8:26pm

I just want to mention that the first GTDB was rolled in Silva 132 and 138. Of course, that doesn't cover the newest update, (as of like last week), but it does cover the 2018 release.

Best,
Justine