unassigned taxonomy in fragment-insertion sepp tree vs mafft tree

I'm comparing the trees generated from align-to-tree-mafft-fasttree and from fragment-insertion sepp (QIIME 2 Library, but with Silva 128 sepp database). I found that for the sequences with unassigned kingdoms, mafft organized all of them in a separate branch, while fragment-insertion sepp put some of them together with bacteria. I plotted the trees labeled with the kingdom names for your reference (tree - Google Drive). I'm not sure what's going on, and can someone help with this? Thanks!

Part of this difference must stem from the underlying algorithms of SEPP and MAFFT. 🤷

The whole goal of SEPP is to 'placing short sequences more accurately when the set of input sequences has a large evolutionary diameter', so I'm not surprised SEPP got these sequences somewhere within inside your tree, instead of a separate, deep branch.

How these algorithms are implemented within in the plugin could make a difference too. The _align_to_tree_mafft_fasttree.py plugin builds a mafft alignment from scratch, without relying on an existing alignment like the mafft_add function which implements the mafft --addfragments method described here.


Let's zoom out a bit. :earth_africa: :telescope:

If the classifier can't even place these within a Kingdom, these sequences must be very divergent from everything else in your database. Where should these hyper-divergent sequences land within your tree?

What biological question are you hoping to answer with this tree? Are these kingdomless ASVs part of your core question?

5 Likes

Thank you so much for your explanation!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.