I'm trying integrate OTU table data from different batches.
Hence, I'm considering to do normalization before any further analysis.
I have looked through normalization methods and found that Centered Log ratio was one of the most mentioned method.
(I know it is built in many plug in of qiime. But I have to do other analysis in addition to the functions in qiime2)
However, I found that a sentence saying "Make Pseudocount before CLR"). Is there any pre-step before do CLR transformation ? Or the pseudocount would be made in the normal process integrated in CLR transformation ? I will use R, ALDEx2 package / or Python sckit-bio package to do my table transformation to CLR form.
Please suggest me and explain me about the question.
Hello!
To get CLR data like in aldex2, I used this code:
import pandas as pd
from skbio.stats.composition import clr
# Function from absolute abundances to clr
def to_clr(data):
data += 1 # add pseudocount
data = data.div(data.sum(axis=0), axis=1) # relative abund
return pd.DataFrame(clr(data.T), columns=data.index, index=data.columns).T #clr
As well as I remember I got identical results as in aldex2.
So, I added pseudocounts, converted to relative abundances and applied CLR from skbio. It was also neccessary to transpose the data before clr and then transpose it back.
Dear timanix, I have one more question to it. When I try the from skbio.stats.composition import clr commend, 'No module named 'skbio' appears even after I installed 'scikit-bio' package using 'conda install scikit-bio' commend.
Thanks, wan't so sure because of the pandas transposed representation (Feature table importing) while saving a df based on feature table (df.feature_table.view(pd.DataFrame))