I just used fragment-insertion to generate a tree following this tutorial. I used Silva 128 as a reference tree. After viewing the output in a phylogenetic tree viewer, I realized that the tree that it generates includes the entire Silva reference tree, along with all my representative sequences. Is there any way to use fragment-insertion to generate a tree with the reference tree removed, so that it only includes my representative sequences, or am I fundamentally misunderstanding how fragment insertion trees work?
My understanding is that the goal of fragment-insertion is to 'build upon' an existing tree to more accurately and comprehensively describe your new ASVs, and I think removing the reference may go against that.
Perhaps you have a different use-case! What are you hoping to measure with this tree?
If you don't want to use a reference, check out the other options listed here.
I'm just trying to create a visual relationship between the taxa that I have identified in my sample. I was able to use align-to-tree-mafft-iqtree to achieve this, but I found that it produces some nonsensical results. For example, in the screenshot below, Aeromonadaceae, Enterobacteriaceae, and Pseudomonadaceae are all grouped together in the same clade, while other Aeromonadaceae are in a separate clade. I was hoping I could use another technique, like fragment insertion or reference-based alignment to obtain more accurate results
Issue 1: The question of tree filtering. @kindergarten's approach is what I would recommend to get the fitlered placements.
As far as I understand tree filtering, it will "prune" the branches that shouldn't be; and shouldn't change distances, just get rid of forks/tips you don't want. (Unlike my real flowers, which I definitely over pruned last week. . Alas...)
The second issue, though is taxonomy and phylogeny:
The problem you're observing is one of polyphyletic clades. Which is not going to be changed by your tree building algorithm.
So, taxonomy and phylogeny are kind of uneasy bedfellows. We want them to agree. It would make life so much easier if they just... agreed. Especially as we do work where we do taxonomy inference based on a phylogenetic sequence. But, they're based on two different things.
Taxonomy was, AFAIK, originally based on morphology and characteristics. So, gram positive or gram negative? Does it digest lactose, galactose, or something else? etc. It's a way to try and organize a somewhat chaotic world by imposing a set of names that tell us about things.
Phylogeny tells us about the evolutionary history, in our case based off of a specific gene. Its a question of how many generations of genetic telephone were required to distort the original message this way
The reason they're uneasy bedfellows has to do with the fact that the way we characterize things doesn't always line up with the molecules. Convergent evolution is a weird thing. Apparently everything eventually turns crab-shaped ... but just because everything looks like it would be tasty with old bay doesn't mean that it's actually related if we look at it from a 18S molecule. But, if the character behind our taxonomy is being crab shaped, we we put everything crab shaped into one bin, and everything not!crab in another bin, it may not matter that one of the not!crabs is actually more closely related, it's going to have a different name when you throw it on a tree.
One of the most obnoxious places this shows up is actually the Firmictues phylum. Which... it umm, turns out... is not actually monophyletic. There are six (6!) different Firmicutes phyla. This comes out this amazing 2018 paper: