How to derive the distance matrix (similarity matrix of ASV)

steffi · June 25, 2020, 3:21pm

Dear All,
I am following qiime2 tutorial. I was wondering is there anyway I can retrieve distance matrix (similarity matrix) of ASVs identified

ChrisKeefe · June 25, 2020, 4:19pm

Hi @steffi, can you please clarify your question? What exactly are you trying to produce or "retrieve"? What have you done so far?

Best,
Chris

steffi · June 25, 2020, 6:02pm

Hi @ChrisKeefe

I executed till diversity step. I also have the dis(similarity) matrix between the samples using beta diversity.
I need to generate the similarity matrix between the ASVs. If I convert the phylogentic tree (tree.nwk) into distance matrix, Will that be count as similarity matrix?

ChrisKeefe · June 25, 2020, 8:19pm

Sorry, @steffi, but I'm still not clear on what you're looking for. There are many QIIME 2 tutorials, and many different workflows you can use beyond those tutorials, so "executed till diversity step" doesn't provide much useful detail.

A similarity matrix is pretty broadly defined in stats, and I suspect you're looking for a specific outcome here. Can you describe in detail what you're trying to create, or provide a link to a resource that describes it well?

Thanks,
Chris

steffi · July 7, 2020, 12:23pm

I am really sorry for the late reply. I followed moving “Moving Pictures” tutorial — QIIME 2 2020.6.0 documentation and generated a tree for phylogenetic diversity analyses. Now I need to convert the rooted.tree into matrix format. I tried using cophenetic.phylo r package and got the distance matrix. But If i am correct, I hope the score between 0 and 1. But here, I got few vales greater than 1. Am i doing anything wrong? tree_V7_V9.csv (8.6 KB)tree.nwk (1.5 KB)

SoilRotifer · July 7, 2020, 11:39pm

Hi @steffi,

Those values are basically pair-wise summations of the branch-lengths, i.e. tip-to-tip distances. The branch-lengths can vary depending on the substitution model used. So, they typically would not be bounded between 0 and 1.

You could try something like ape's dist.dna. Though I doubt that this will bound the output between 0 and 1, and you'll have to pick the appropriate substitution model for your data. But not all distances need to be bounded between 0 and 1. You could, I suppose, simply scale / normalize these values.

It is still not clear what you are trying to accomplish, what is your end goal?

Also note, similarity matrices are not the same as a distance matrices. A distance matrix must satisfy the triangle inequality. A similarity matrix does not (e.g. Bray-Curtis (dis)similarity).

-Mike

steffi · July 20, 2020, 1:46pm

This is a part of different project where we are trying to map OTUs from different studies. For that I need similarity or dissimilarity matrix of OTUs. I searched thoroughly to get the (dis)similarity matrix between OTUs (not between the samples). Is it possible to build unifrac distance matrix (weighted or unweighted) or jaccard distance between the OTUs. ?