Greengenes or Silva for 16s

Hi,

Could you please suggest which one is better database for 16s based amplicon sequecning analysis?

Thanks
Yogesh

2 Likes

Hi @Yogesh_Gupta,

Its a controversial question and depends on what you want. Like so many questions, there is no right answer, although you should have a reason why you select a database. There are pros and cons of every databases.

Greengenes is smaller and is relatively good for human data. However, the database has not been updated in several years, meaning that it may be missing taxonomic annotations. It can be used as the basis for other predictions (q2-clawback might be of interest here) and for use with databases like qiita.

On the other hand, Silva is much larger (making it more memory intense). However, Silva is newer and better at characterising diverse environments (soil, ocean, etc). It may be harder to integrate into other platforms, and depending on your field, taxonomic assignments may be different from publications. (You should always make sure to check their database!)

Finally, you may want to look for curated or environment specific databases, like HOMD for oral microbiomes.

Best,
Justine

7 Likes

Thanks @jwdebelius,

I am mostly handing soil samples so may be silva database is better choice as you have suggested.

Thanks
Yogesh

4 Likes

Hi @jwdebelius,

Thanks for your help. I need to download silva database, but got confused, which file to download. There are multiple files on silva database. I do need taxonomy and fasta files for Qiime2.

Thanks
Yogesh

1 Like

Hi @Yogesh_Gupta,

Did you check the link on the data resources page?

Best,
Justine

1 Like

Yes. It took me on this silve database link


Thanks
Yogesh

1 Like

Hello @Yogesh_Gupta,

I’d like to offer assistance along with @jwdebelius if welcome.

The files in question for a particular Silva release can indeed be downloaded from the page you’ve linked (https://www.arb-silva.de/download/archive/qiime).

I study 16S microbiome for environmental water samples. For the sequence files, I’ve used the rep_set (NOT rep_set_aligned) fasta files for their particular percent identity of interest (e.g. 99% identity). For Silva, this can be found in the “Qiime” friendly release for a particular version (e.g. Silva 128) > rep_set > rep_set_16S_only > 99 > 99_otus_16S.fasta

For the taxonomy files, this can be found in “Qiime” friendly release for a particular version (e.g. Silva 128) > taxonomy > 16S_only > 99 > consensus_taxonomy_7_levels.txt. I unfortunately have no insight into why I chose consensus vs majority. My decision to use this was likely motivated by seeing consensus taxonomy being used in Qiime2 tutorials.

Mark

5 Likes

Hi,

Thanks for all help. Can anyone suggest which silva database should I use 132 or 128 for soil samples and what is the difference between majority and consensus taxonomy files?

Thanks
Yogesh

1 Like

Hi Yogesh,

I think version 132 would be the best option for soil (or any sample which you expect a complex prokaryotic community and if diversity is a question of yours). My understanding is SILVA132 is the newest release (another should come soon according to their website). But SILVA132 should contain greater diversity AND updated taxonomy relative to SILVA128.

For example, reads assigned to “Bathyarchaeota” OTUs are updated taxonomy-wise in SILVA132 relative to SILVA128. I believe in SILVA128, Bathyarchaeota was considered its own phylum. In SILVA132, these same OTU’s have been reclassified as a class (Bathyarchaeia) within the phylum Crenarchaeota… likely reflecting active changes in name assignments outside of the team who works on SILVA database updates.

4 Likes

Thanks @mcreyno2,

Thanks for your help, I still do not understand the differences between majority and consensus taxonomy files? I will be thankful if anyone can help me to understand this as well.

Kind Regards
Yogesh

1 Like

Have you checked the notes file that is found in the SILVA release files? It explains the motivation and method behind this in detail.

2 Likes