Greengenes Versus Silva

SoilRotifer · January 29, 2020, 4:43pm

Hi @Tiago_Bruno_Rezende, welcome to :qiime2:!

Great question! The reason that the SILVA alignment uses Uracils instead of Thymines, is because the curated sequence alignment is informed by secondary structure in order to reduce alignment ambiguity. So, we honor the reality of the rRNA molecule when using this secondary structure information to inform our alignment. Also, it is easy enough to simple replace these when needed.

Nope. Though I am sure you can find files generated by third parties. The 16S and 18S rRNA genes are in fact homologues . This is why it is valid to keep them in the same alignment. I personally prefer have the 18S rRNA gene sequence data present in my reference taxonomy and sequence files. This helps with the identification and removal of off-target (unwanted) 16S and 18S sequences, as these will be classified as such . That is, it is quite common for primers to amplify off-targets of host organisms. Hence, the occasional need for blocking-primers or peptide nucleic acid (PNA) clamps.

I refer you again to the topic of reducing alignment ambiguity, and this SINA tutorial, which is still a work in progress. Historically, other tools like PyNAST , Infernal and SINA, use a curated secondary structure informed alignment to guide the alignment of unaligned sequence data. The idea is that this will create a more robust alignment for the generation of an improved de novo phylogeny. Though, in many cases tools like MAFFT, etc.. appear to perform generally well enough, w/o the need for secondary structure information. Though your mileage may vary.

-I hope this helps!
-Mike