Before I get into CLR transformation, is there any step to change my OTU table ?

Hello ! Folks.

I'm trying integrate OTU table data from different batches.

Hence, I'm considering to do normalization before any further analysis.

I have looked through normalization methods and found that Centered Log ratio was one of the most mentioned method.
(I know it is built in many plug in of qiime. But I have to do other analysis in addition to the functions in qiime2)

However, I found that a sentence saying "Make Pseudocount before CLR"). Is there any pre-step before do CLR transformation ? Or the pseudocount would be made in the normal process integrated in CLR transformation ? I will use R, ALDEx2 package / or Python sckit-bio package to do my table transformation to CLR form.

Please suggest me and explain me about the question.

Many thanks for your help.

Hello!
To get CLR data like in aldex2, I used this code:

import pandas as pd
from skbio.stats.composition import clr


# Function from absolute abundances to clr
def to_clr(data):
    data += 1                                  # add pseudocount
    data = data.div(data.sum(axis=0), axis=1)  # relative abund
    return pd.DataFrame(clr(data.T), columns=data.index, index=data.columns).T #clr

As well as I remember I got identical results as in aldex2.
So, I added pseudocounts, converted to relative abundances and applied CLR from skbio. It was also neccessary to transpose the data before clr and then transpose it back.

3 Likes

Thank you tomanix !! This would be a definitely big help for me ^^ I'll try this and match the result from R.

1 Like

Dear timanix, I have one more question to it. When I try the from skbio.stats.composition import clr commend, 'No module named 'skbio' appears even after I installed 'scikit-bio' package using 'conda install scikit-bio' commend.

How did you prepare the using of librarie ?

Thank you for your kindness.

Hi!
I just run the code inside of qiime2 env since I am a little bit lazy and most of the required libraries already installed there.

Thank you. I've try to install the 'scikit-bio' package in the Windows environment, and that was the problem.

Now I successfully did the analysis using the package in the linux environment. (scikit-bio package could be installed only under linux)

Thank you !!

1 Like

does your function assumes taxa as columns ans samples as index or vice versa?

It takes a table as in feature table, so features as index and samples as columns.

1 Like

Thanks, wan't so sure because of the pandas transposed representation (Feature table importing) while saving a df based on feature table (df.feature_table.view(pd.DataFrame))

2 Likes