What is the difference between Greengenes and SILVA?

Hi @Aqleem12!

Greengenes is a 16S/Archaea database and SILVA is a 16S/18S/Archaea database. SILVA also has 23S/28S databases but I'm not sure if those are in a QIIME-compatible format. You can find a list of QIIME-compatible reference databases on the data resources page.

SILVA 123 and 128 are different versions of the SILVA database. In general, you'll want to keep up-to-date with the latest versions of your reference database of choice.

Both Greengenes and SILVA databases contain reference taxonomies that include species-level annotations. However, you may or may not get species-level classifications of your data depending on the feature-classification algorithm (and parameter configurations) you choose. This preprint benchmarks different feature classification algorithms and parameter configurations, and provides some recommendations for classifier/parameter choices.

There isn't a yes/no answer here for which reference database to choose. Each database has its strengths and limitations, and every reference database has inherent flaws (there is no "correct" database).

Each person you talk to will have their own opinions and preferred reference database. I encourage you to reach out to the colleagues you mentioned to find out why they prefer one over the other.

Some factors that come to mind:

  • You may have to choose one database over another depending on the marker gene you're targeting. For example, if you have 18S data, Greengenes wouldn't be an option because it's a 16S database.

  • Size of the reference database. SILVA is larger than Greengenes, which can require more CPU time and memory to use, with the benefit of having more reference data.

  • Database updates. SILVA provides fairly regular updates (i.e. new versions) of its database, while the last Greengenes release was August 2013.

  • Techniques used to construct the databases. This is where you'll need to do some digging -- how are the reference databases constructed, and do you prefer/trust one method over another?

I also recommend checking out the official Greengenes and SILVA websites, along with their associated publications, as you make this decision. Finally, if this is an option for you, try out both databases and compare your results!

6 Likes