mafft or fragment-insertion

mohsen_ej · November 16, 2020, 8:18pm

Hi all
I read Phylogeny tutorial but I didn't understand when its better to use mafft and when use fragment-insertion. its written there that fragment-insertion is good "if your reference phylogeny encompass neighboring relatives of which your sequences can be reliably inserted. Any sequences that do not match well enough to the reference are not inserted."
so how can I know that?
because I want to use diversity metrics and its needed to import phylogeny tree but I'm not sure which one is better for me while the results are different.
Thank you

SoilRotifer · November 16, 2020, 8:54pm

Hi @mohsen_ej, it depends.

You'll simply need to read up, or investigate the contents of the files, to see what these reference files contain. There are two commonly used reference sets for fragment-insertion (SEPP): GreenGenes & SILVA. SILVA is a much larger reference set and contains many more reference sequences, particularly ideal for environmental data sets. Whereas either of these reference files appear to do well for humans.

Generally speaking, many consider fragment-insertion to be ideal, as you are inserting your short sequence fragments into a curated reference phylogeny that is based on full length rRNA sequences. Whereas constructing a de novo phylogeny from short reads can create some potential problems, depending on the gene and phylogeny tool you use. For more information on why that is, read the fragment insertion paper linked that phylogeny tutorial.

I typically run both de novo and fragment insertion approaches on my data. Sometimes they are quite different, other times they are not. I try to stick with fragment-insertion when possible (fragment insertion can require more resources to run). Otherwise, I often use IQ-TREE for de novo phylogenetic reconstruction. But your mileage may vary.

But if you are using full-length rRNA sequence data you would be fine to use a de novo approach.

-Mike

mohsen_ej · November 16, 2020, 9:09pm

Thanks for your prompt reply.
I have run both of them before, but I do not know exactly which of the two is more appropriate for my data.
I mean, how do I know which one worked better?
Excuse me if my question is a beginner.

SoilRotifer · November 16, 2020, 9:16pm

There is no easy answer to this, but for short reads, I suggested:

Again, read the fragment insertion paper and see if the logic outlined there makes sense to you. If it does, then go with that approach.

-Mike

mohsen_ej · November 16, 2020, 9:17pm

For Example you can see two bray-curtis plot which are made by tree.qza (fragment-insertion) and rooted-tree.qza (mafft) :

SoilRotifer · November 16, 2020, 9:22pm

Yeah, like I said:

In this case, these plots look identical, just flipped on Axis 2. ProTip: you can invert any of the axes in the visualizer. In this case I'd invert Axis 2 to make it look more like the other PCoA plot. Also, this brings up another point: if your biological signal is very robust, it will not matter if you use de novo or fragment insertion output. So, pick your favorite . Seems like the variance along the axes is is better explained with the fragment insertion.

mohsen_ej · November 16, 2020, 9:25pm

many thanks for your helpful information.

system · December 18, 2020, 3:25am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.