Not sure if this belongs here or general discussion, but, Ive got a problem squaring taxonomy and phylogeny. I say with with the caveat that Id expect some polyphyletic clades… except mine appears at the phylum level.
I have a 2 x 300 V3-V4 where I merged ends in vsearch, loaded them into QIIME, and then ran deblur. I built my tree using SEPP fragment insertion, and did taxonomic classification with a classifier I trained on my model. For taxonomic analysis, I filtered down to a set of ~300 ASVs that I considered “high abundance/high prevalence”
I wanted to make a heatmap where I by the phylogenetic tree and hierarchical clustering of my distance matrix along the two axes. So, I imported my QIIME artefacts into Python, turned them into a pandas dataframe, and two scikit-bio linkage matrices. As a sanity check/reference point, I used row labels colored by phylum and class.
All the clustering looked good and logical to me… except for the Tenericutes (orange) sitting in the middle of my Firmicutes (green). I think Tenericutes used to be considered part of Firmicutes, and I’d definately expect to see some polyphyletic clades among Firmictutes at lower levels, but I was suprised to see it see it here.
So, I guess after a long explanation, Im concerned there’s something wrong in my pipeline. I think the point of failure could either be (a) classification, (b) tree building, or © the linkage matrix. I was really careful when building the linkage matrix to order the tree with my dataframe. Did I just do something really stupid and the distances are large, but the frame ended up funky?
Im re-assured by the fact that the rest of the phyla and clades cluster pretty well. But, advice or suggestions for debugging would be welcome!