silva database question

Hi

I am wondering why the silva database is so long. There are over 2000 bacteria, so why?

And I am wondering if there are no start and stop codons in 16srRNA.

Thank you.

Hello @kimshinseung,

The newest SILVA release when I write this is v138.1. The SSU Ref contains 2 million reads. After clustering this at 99% identity to make SSU Ref NR 99, there are still 510,508 reads.

That’s just a lot of bacteria! That’s probably why the database is so big… There are even more reads in the LSU Ref.

Start and stop codons show where protein-coding genes will be translated into proteins in messenger-RNA. Ribosomal-RNA is part of the ribosome and is not protein-coding, so rRNA does not have any codons at all!

1 Like

Thank you for answer.

The first question was about the length, not the number of bacteria in the silva database.

The 16s rrna is about 1500 bp, but most of the silva is over 2000 bp.

Why is it so long?

Thank you.

I think I’m a little confused…

Figure 1 from the 2007 Silva paper show that most sequences in SSU Ref are under 1500 in length:

Sequences in the LSU Ref database are longer, but I’m not sure what database you are using…

What is the median length you are seeing in Silva? I’m not an expert on the Silva database, so I could be missing something!

Let me know what you see.

Thank you for answer.

I have extracted the 2 qza files downloaded here.

qiime rescript get-silva-data
–p-version ‘138’
–p-target ‘SSURef_NR99’
–p-include-species-labels
–o-silva-sequences silva-138-ssu-nr99-seqs.qza
–o-silva-taxonomy silva-138-ssu-nr99-tax.qza

Would there be a problem if I write this data as it is?

Thank you.

Remember the SILVA SSU contains not only Bacterial, Acrachaeal, and Eukaryal 16S rRNA gene sequences, but also contains the Eukaryal 18S rRNA gene sequences too (which are longer). That is, the 18S rRNA gene is a cystolic homologue of the 16S rRNA gene.

-Mike

5 Likes