I am a python/bioinformatics beginner running fragment-insertion for my oral microbiome sequences. I am using the Silva SeppReferenceDatabase; out of curiosity, I would also like to try using the Human Oral Microbiome Database and compare outputs using the two databases. Is it possible/advisable to create a SeppReferenceDatabase using HOMD downloads, and if so, how would I go about it? Thanks in advance!
Welcome to the forums!
Yes! But... this means building a brand new SEPP database using HOMD as discussed on the sepp buildrep GitHub page, and this is after you have built a massive MSA of the Human Oral Microbiome Database.
Advisable? Maybe not...
The primary problem with building a new databases, or a new SEPP reference tree, is that you have to justify it to the most critical reviewer. This means benchmarking your new DB against existing ones, and arguing that any differences are improvements, and not errors or regressions. With an existing database, you can argue that it's a standard in the field, avoiding these questions entirely. If SILVA is imperfect, at least it's not your fault.
I would love to see new pre-built tree included in sepp-ref for UNITE, HOMD, GTDB, etc., but this does not mean you have to be the one to do it.
Check out this paper, in which they use HOMD to assign taxonomy and SEPP to built a tree against a different (presumably pre-built) database. That could be an easy way to get the best of both worlds!
That's a helpful answer, thank you!