unassigned taxonomy in fragment-insertion sepp tree vs mafft tree

qwcheng · December 15, 2021, 7:19pm

I'm comparing the trees generated from align-to-tree-mafft-fasttree and from fragment-insertion sepp (https://library.qiime2.org/plugins/q2-fragment-insertion/16/, but with Silva 128 sepp database). I found that for the sequences with unassigned kingdoms, mafft organized all of them in a separate branch, while fragment-insertion sepp put some of them together with bacteria. I plotted the trees labeled with the kingdom names for your reference (https://drive.google.com/drive/folders/1XNhH3xssXh_M3qFSHJaf5z5b5UuhYY1t?usp=sharing). I'm not sure what's going on, and can someone help with this? Thanks!

colinbrislawn · December 16, 2021, 12:56am

Part of this difference must stem from the underlying algorithms of SEPP and MAFFT.

The whole goal of SEPP is to 'placing short sequences more accurately when the set of input sequences has a large evolutionary diameter', so I'm not surprised SEPP got these sequences somewhere within inside your tree, instead of a separate, deep branch.

How these algorithms are implemented within in the plugin could make a difference too. The _align_to_tree_mafft_fasttree.py plugin builds a mafft alignment from scratch, without relying on an existing alignment like the mafft_add function which implements the mafft --addfragments method described here.

Let's zoom out a bit.

If the classifier can't even place these within a Kingdom, these sequences must be very divergent from everything else in your database. Where should these hyper-divergent sequences land within your tree?

What biological question are you hoping to answer with this tree? Are these kingdomless ASVs part of your core question?

qwcheng · December 19, 2021, 4:43pm

Thank you so much for your explanation!

system · January 19, 2022, 10:43pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.