Thanks for the screenshots, those results definitely indicate an issue.
What is probably happening here is coming from a problem upstream in your analysis.
So that I can confirm that, could you post a
.qzv (or a small
.qza file) from your
I’ll be able to look at the provenance to double-check.
Could you also briefly describe what you are analyzing, and where your sequences came from (how many runs, is it a meta-analysis, etc)?
Since we are seeing the strange results with Bray-Curtis, we should think about its definition and what that means. It is 1 minus the sum of shared features over the sum of all features between two given samples; then in order to see a value of 1, there must be no shared features.
This gets into your question here a bit:
Depending on your analysis, there may not be any database used. If you were following the Moving Pictures Tutorial for example, a database doesn’t actually come into play until we do taxonomic analysis (well after core-metrics).
Instead of having OTUs that map to something like Greengenes, we are generally using what are called Amplicon Sequence Variants (or ASVs). These are effectively 100% OTUs with some denoising to correct for sequencing error. What this means is that the ASVs are only comparable if they are from the same amplicon target, and are the same length.
If you had multiple runs, of different amplicon targets, and then merged them into the same table, there would be no shared features between runs. And you would end up with many samples which had a Bray-Curtis distances of 1 between each other.
Similarly if you had multiple runs, but trimmed at different lengths (
trunc-len with paired-end is a special case) you would have representative sequences which, while coming from the same amplicon target, do not match the representative sequences of other runs. Once again this results in features that never match, and samples that always have a Bray-Curtis distance of 1.
This also explains a bit why you do see “normal” separation for unweighted UniFrac, it has a phylogenetic component. In QIIME 2 we don’t use a reference phylogeny, instead we construct a quick-and-dirty one on the fly with MAFFT and FastTree. So even though your features aren’t comparable with each other, there will still exist some alignment and therefore there will be some phylogeny that can be constructed. So the spread you see from the unweighted UniFrac PCoA is really just because there exists a phylogeny between your representative sequences (ASVs).
If you are doing a meta-analysis with different targets or don’t have the raw sequence data available, there are some OTU-based methods which you can use to resolve this, but I would need to know more about your dataset to really recommend something.
Let me know if that makes sense!