Sample-classifier input table

timanix · November 9, 2022, 11:18am

Dear all,
Currently I am playing with sample-classifier and try to predict metadata categorical values with sample-classifier.

This pipeline takes as input table (frequency) with absolute values.
My question is how (if) counts are normalized before the classifier training. Is it making sense to use relative frequencies or another normalization output instead?

Nicholas_Bokulich · November 10, 2022, 3:26pm

Hi @timanix ,

No normalization is done by this action. The idea is to eventually implement separate normalization actions that could then pass normalized tables to this function or elsewhere. However, it is not so simple

Not necessarily. It depends on the properties of the classifier. Many normalization methods for compositional data were designed for differential abundance tests, and their appropriate application to supervised classification problems is still an open question (see https://doi.org/10.1093/gigascience/giz107).

Knights et al. (https://academic.oup.com/femsre/article/35/2/343/661201) recommend rarefying prior to classification to avoid introducing library size biases, so this is one option (as a rarefied table is still FeatureTable[Frequency]).

timanix · November 10, 2022, 3:46pm

Thank you for the answer!
I will rarefy the tables, since the only variable I can predict now with high confidence is sequencing run
Meanwhile the factor of the interest is giving me only 0.4 accuracy with 3 levels.

I will also try to run it with DESeq2-like normalisation (rounded) to see if it will affect the model.

system · December 11, 2022, 9:47pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.