phylogenetic tree output of fragment-insertion

anna-schrecengost · November 7, 2022, 8:28pm

Hello! I have a pretty broad question about qiime2 fragment-insertion, and apologies if this has been discussed previously.

I am interested in inserting short reads analyzed with dada2 and deblur into small curated reference phylogenies, like a few hundred sequences. for one project I am working with really low diversity datasets, and really only have one ASV per sample which I am adding to the phylogeny. For another project I am working with datasets with normal diversity but am focused on seqeunces which are more rare, generally 0-10% of the total reads. I like fragment-insertion because I can analyze reads from different primer sets together, and because it can generate new branches on the tree, not just add to existing branches.

I have tried this and gotten it to work successfully with my own reference dataset(s). But i have gotten some funky results that disagree with multiple other methods I have used (de novo phylogeny, taxonomic assignment, and full length sanger sequencing of the same population for the low diversity data, etc.)

My questions are: Is the actualy phylogenetic tree that you get as output reasonably reliable? Or should it really only be used for downstream analyses and not visualized on its own? are the phylogenetic relationships between the inserted seqeunces reliable?

colinbrislawn · November 9, 2022, 11:18pm

Hello Anna,

This is a great question!

You could try it for yourself. Plug your reads into a couple of MSA programs and see if the results are similar. Then build trees from these and see if the trees match.

Soilers:

This process is not reliable at all, for two reasons.

First, de novo MSA is NP-complete so all the modern algorithms are using a variety of heuristics to get pretty-good results quickly.

Secondly, there is only so much phylogenetic history encoded in your amplicons, and reliably inferring +3 billion years of evolutionary history using a set of 250 basepair reads is a tall order.

No, lol. Neither are the internal relationships of original or added tips.

Now let's flip the script.

You could use no tree at all, and treat each new amplicon as equally unrelated. Without any phylogeny, there's no easy way to group similar ASVs together or to compare them to existing taxa.

In this context, having a phylogenetic tree no matter how dubious, becomes so much better than nothing.

And you can get a lot done with a good-enough tree.

This is a great use case for fragment-insertion! Even if the reference tree is bad, the branch placement of new ASVs tells you which existing features are similar.

One common use case for trees is calculating UniFrac distances between samples. Here, the tree becomes an intermediate step in calculating the shared phylogeny between samples, and this phylogenetic relatedness is super useful, even if the tree is not perfect. This is another example of the downstream analysis you mention.

You can totally graph these perfectly-imperfect trees! Check out the Interactive Tree of Life (iToL) and the ggtree R package.