Hi there,
Not to add to the glut of questions regarding PCoA outputs on QIIME vs R, but I couldn't find one that addresses the issue I'm having. For some background, I have multiple distance matrices that I've produced using phyloseq, rbiom, etc. based on data that I've imported into R from QIIME, and I'd like to analyze/visalize my data with PCoA. When I use the binary jaccard, the results I produce in R are identical to those produced in QIIME. So far, so good. However, when I tried running PCoA on the bray-curtis or unifrac distance matrices, my results started to differ. After some digging, I saw that there's some confusion on how exactly QIIME treats the data prior to computing a bray-curtis distance matrix, and I saw that there are some issues with phyloseq computing unifrac distance matrices. After troubleshooting, I still couldn't replicate the results from QIIME2 in R, so I decided to try importing and analyzing the distance matrices that QIIME2 produced and was (theoretically) using for PCoA. This is where things got weird for me. Even when I use the exact same data that QIIME2 is using (i.e. the distance matrix I imported directly from qiime2 into R), my results still differ. Below is an example showing the results from PCoA using a bray-curtis matrix:
And again showing results from weighted unifrac:
The distances are slight so in the grand scheme of things they probably don't matter, but I'm still not sure how this is possible.
Also, for what it's worth, while I couldn't recreate the bray-curtis matrix that qiime produced in R, the weighted unifrac matrix I produced with rbiom was identical to the one that I ultimately ended up from qiime2 for the comparison above.
If anyone has any thoughts or ideas on what's going on here please let me know!
Cheers,
Noam