Remove reference tree from fragment insertion tree

a_staubus · June 23, 2022, 4:47pm

Hello!

I just used fragment-insertion to generate a tree following this tutorial. I used Silva 128 as a reference tree. After viewing the output in a phylogenetic tree viewer, I realized that the tree that it generates includes the entire Silva reference tree, along with all my representative sequences. Is there any way to use fragment-insertion to generate a tree with the reference tree removed, so that it only includes my representative sequences, or am I fundamentally misunderstanding how fragment insertion trees work?

Here are the commands I ran:

qiime fragment-insertion sepp
--i-representative-sequences Exp11_rep_seqs.qza
--i-reference-database sepp-refs-silva-128.qza
--p-threads 10
--o-tree Exp11_insertion_tree.qza
--o-placements Exp11_insertion_placements.qza

I'm using qiime2-2021.11

Thanks in advance for any help!
-August

p.s. I would love to use SINA to perform a reference-based alignment but apparently it's not included in my version of qiime2 and I'm not sure how to get it

colinbrislawn · June 23, 2022, 5:41pm

My understanding is that the goal of fragment-insertion is to 'build upon' an existing tree to more accurately and comprehensively describe your new ASVs, and I think removing the reference may go against that.

Perhaps you have a different use-case! What are you hoping to measure with this tree?

If you don't want to use a reference, check out the other options listed here.

a_staubus · June 24, 2022, 4:39pm

I'm just trying to create a visual relationship between the taxa that I have identified in my sample. I was able to use align-to-tree-mafft-iqtree to achieve this, but I found that it produces some nonsensical results. For example, in the screenshot below, Aeromonadaceae, Enterobacteriaceae, and Pseudomonadaceae are all grouped together in the same clade, while other Aeromonadaceae are in a separate clade. I was hoping I could use another technique, like fragment insertion or reference-based alignment to obtain more accurate results

kindergarten · June 24, 2022, 7:20pm

I am also trying to perform similar analysis.
Can we do following

Insert ASVs into Silva 128 to generate an insertion-tree.qza with "accurate" placements of ASVs
Then filter insertion_tree.qza using qiime phylogeny filter-tree (filter-tree: Remove features from tree based on metadata — QIIME 2 2022.2.0 documentation) to get a filtered_tree.qza.
Will the new filtered_tree.qza have the "accurate" placements based on the reference tree?

jwdebelius · June 25, 2022, 9:05pm

Hi @a_staubus and @kindergarten,

I think there are two issues here, and both @colinbrislawn and @SoilRotifer should feel free -in.

Issue 1: The question of tree filtering. @kindergarten's approach is what I would recommend to get the fitlered placements.

As far as I understand tree filtering, it will "prune" the branches that shouldn't be; and shouldn't change distances, just get rid of forks/tips you don't want. (Unlike my real flowers, which I definitely over pruned last week. . Alas...)

The second issue, though is taxonomy and phylogeny:

The problem you're observing is one of polyphyletic clades. Which is not going to be changed by your tree building algorithm.

So, taxonomy and phylogeny are kind of uneasy bedfellows. We want them to agree. It would make life so much easier if they just... agreed. Especially as we do work where we do taxonomy inference based on a phylogenetic sequence. But, they're based on two different things.

Taxonomy was, AFAIK, originally based on morphology and characteristics. So, gram positive or gram negative? Does it digest lactose, galactose, or something else? etc. It's a way to try and organize a somewhat chaotic world by imposing a set of names that tell us about things.

Phylogeny tells us about the evolutionary history, in our case based off of a specific gene. Its a question of how many generations of genetic telephone were required to distort the original message this way

The reason they're uneasy bedfellows has to do with the fact that the way we characterize things doesn't always line up with the molecules. Convergent evolution is a weird thing. Apparently everything eventually turns crab-shaped ... but just because everything looks like it would be tasty with old bay doesn't mean that it's actually related if we look at it from a 18S molecule. But, if the character behind our taxonomy is being crab shaped, we we put everything crab shaped into one bin, and everything not!crab in another bin, it may not matter that one of the not!crabs is actually more closely related, it's going to have a different name when you throw it on a tree.

One of the most obnoxious places this shows up is actually the Firmictues phylum. Which... it umm, turns out... is not actually monophyletic. There are six (6!) different Firmicutes phyla. This comes out this amazing 2018 paper:

https://www.nature.com/articles/nbt.4229

Which, essentially, says that the taxonomic names you're using from Silva don't actually line up with the phylogeny you're using with the sequences.

I'm not sure how to solve the display problem (I just sort of smile, nod, and wave my hands at it until some reviewer actually digs deep into the taxonomic display tree I've got.)

Best,
Justine

kindergarten · June 27, 2022, 1:38pm

Justine, thanks for elucidating distinction between taxonomy and phylogeny. Very informative.

system · July 28, 2022, 7:38pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.