I am looking for suggestions regarding when you should make a phylogenetic tree. I used deblur to process my dataset. Should the tree be made with the rep-seqs from the deblur output? After deblur, I removed mitochondria and chloroplast sequences from my dataset. I also filtered by taxon frequency, reducing my total number of taxa from 2,016 (deblur output) to 898. Should the tree then be made only with the 898 taxa? Does having more sequences in the alignment improve the quality? Will the weighted unifrac distances be compromised if taxa present in the tree are not also present in the biom table?
I appreciate any recommendations.
It kind of depends on how you construct a tree. For example if you use SEPP (via q2-fragment-insertion) then it doesn’t matter too much as things are always inserted into a reference. If you are doing a de-novo construction, then you are definitely better off filtering out things like mitochondria/chloroplasts as they will probably impact the other distances.
I’m not an expert on phylogenetics, but more will help up to a point, until you’ve reached the limit of what you can really infer from a single gene (which isn’t super great) or your entropy gets too high to align effectively. This is what makes SEPP pretty neat, it can use a multi-loci phylogeny and then use what you do know about your amplicon to make a modified tree.
Nope, UniFrac doesn’t care about unused branches in the tree. It is also relatively robust to bad phylogenetic trees. The way I think of it is any information about evolutionary distance (even if its relatively imprecise) is still much more information than nothing.