Phylogenetic Tree with Reference Sequences

I am interested in creating a phylogenetic tree of a portion of my rep-seqs (16S v 4) including reference sequences. Previously, I have been doing this probably an inefficiently and non-sensical manner that consisted of exporting my rep-seqs as a FASTA and using MEGA-X, adding in NCBI sequences of interest for reference, adding in an appropriate outgroup set of sequences and aligning it and then generating the ML tree (with bootstraps) and rooting it with the outgroup.

However, I have a feeling that this method probably does not make sense (maybe it does?). I conducted my taxonomic classification using a classifier trained with the SILVA138 database so theoretically is there anything wrong with using the SILVA database sequences as my reference sequences? I could be completely oblivious and maybe using mafft would already complete exactly what I need? Any guidance or input to help alleviate my confusion is greatly appreciated!

Further digging makes me think that I should potentially try the qiime frament-insertion sepp way outlined in the Parkinson’s mouse tutorial but still some insight would be great! Is SILVA still not really supported well with this method?

Hi @Ellenphant

Quite the contrary. You can make use of SEPP for SILVA too. See the available SEPP SILVA reference files on the Data Resources page.

If you do not have the resources to perform SEPP, you can also try a de novo approach as outlined here, and use either the iqtree or the iqtree-ultrafast-bootstrap approach. I’d also recommend applying the suggestions outlined in the section “IQ-TREE search settings”, for constructing trees with short reads. Also be sure to read about single branch testing, which is an alternative to bootstrapping.

I often run both approaches just to sanity check my data. But SEPP is often considered more robust for short read data.

Finally, check out the awesome q2-empress, it’s a great way to interact with your phylogenies.

-Hope this helps!

1 Like

Woohoo - SEPP for SILVA. I ran the fragment insertion and it ran successfully with no errors (Woohoo x 2). I will definitely check out the other methods though just for the experience.

One thing I noticed when looking through the methods of research articles is that many people don’t seem to be constructing phylogenetic trees with reference databases and mainly are just using their trees for phylogenetic diversity. With current techniques for taxonomic classification are reference phylogenetic trees not required as they have been in the past or is it just dependent on the research question (e.g. maybe the papers I looked at were more so interested in diversity and not giving correct taxonomic assignments)?

1 Like

That’s correct, current methods for taxonomy classification do not require a tree. They compare against a set of reference sequences but not a tree.

So that’s consistent with your observation of the literature, that trees are being used for diversity analyses but not taxonomy.