We had an abundance matrix, name “mat”. And we had another two matrix, which is changed from mat.They are mat/100 and log(mat). For unweighted unifrac distance, which does not take the abundance info into account, we expect the unifrac distance resulted from mat, mat/100, and log(mat) will be same. However, the 3 unweighted unifrac distance matrix are different. We are confused and looking forward to help. Thank you!
Hey @guojun_wu,
That is a little surprising for the mat/100
case, but perhaps less so for the log(mat)
case as there will likely be many zeros and so it could be dropping those samples/features implicitly or generating NaN
.
Would it be possible to provide the code/program you are using the normalize your feature table? Or failing that, at least the 3 tables that were generated?
Hi Evan, thank you for your reply.
It seems this is something about the value 1 not the value 0. In my mat/100 matrix, several values are < 1. And I think “qiime diversity beta-phylogenetic” treats all value < 1 as 0 when calculate unweighted unifrac distance. Is that right? Thus the unweighted unifrac distances from my mat and mat/100 are different. We supposed the script will treat values < 1 as 1 in the unweighted unifrac distance calculation, but the cutoff seems is 1. I have a matrix with the minimum non-zero value > 1 and I change it into the 1/0 matrix. The the unweighted unifrac distances from these two are same. Thus, relative abundance matrix is not suitable this calculation and we should use downsized one, right? BTW, why set 1 as the cutoff not 0 when calculating unweighted unifrac distance?
Hi @guojun_wu,
Ah, that actually makes a lot of sense, so you are correct you should only calculate unweighted unifrac on the original table.
The reason this is happening is scipy/numpy will use a "floor" function when converting from a floating point number to an integer (this is a very common way of handling float coercion as it is very computationally cheap to do). This means values like 0.7 will become 0 and values like 1.7 will become 1. Since we're dealing with a qualitative metric, the new zeros everywhere will change the results considerably.
So there doesn't appear to be anything wrong going on, but it certainly is surprising at first glance
Thank you Evan
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.