Hello ! Folks.
I'm trying integrate OTU table data from different batches.
Hence, I'm considering to do normalization before any further analysis.
I have looked through normalization methods and found that Centered Log ratio was one of the most mentioned method.
(I know it is built in many plug in of qiime. But I have to do other analysis in addition to the functions in qiime2)
However, I found that a sentence saying "Make Pseudocount before CLR"). Is there any pre-step before do CLR transformation ? Or the pseudocount would be made in the normal process integrated in CLR transformation ? I will use R, ALDEx2 package / or Python sckit-bio package to do my table transformation to CLR form.
Please suggest me and explain me about the question.
Many thanks for your help.
To get CLR data like in aldex2, I used this code:
import pandas as pd
from skbio.stats.composition import clr
# Function from absolute abundances to clr
data += 1 # add pseudocount
data = data.div(data.sum(axis=0), axis=1) # relative abund
return pd.DataFrame(clr(data.T), columns=data.index, index=data.columns).T #clr
As well as I remember I got identical results as in aldex2.
So, I added pseudocounts, converted to relative abundances and applied CLR from skbio. It was also neccessary to transpose the data before clr and then transpose it back.
Thank you tomanix !! This would be a definitely big help for me ^^ I'll try this and match the result from R.
Dear timanix, I have one more question to it. When I try the from skbio.stats.composition import clr commend, 'No module named 'skbio' appears even after I installed 'scikit-bio' package using 'conda install scikit-bio' commend.
How did you prepare the using of librarie ?
Thank you for your kindness.
I just run the code inside of qiime2 env since I am a little bit lazy and most of the required libraries already installed there.
Thank you. I've try to install the 'scikit-bio' package in the Windows environment, and that was the problem.
Now I successfully did the analysis using the package in the linux environment. (scikit-bio package could be installed only under linux)
Thank you !!
does your function assumes taxa as columns ans samples as index or vice versa?
It takes a table as in feature table, so features as index and samples as columns.
Thanks, wan't so sure because of the pandas transposed representation (Feature table importing) while saving a df based on feature table (df.feature_table.view(pd.DataFrame))