Normalization of feature-table

uth · January 14, 2020, 1:00am

Hi,

My question is about normalization of the feature table (OTU table).

I need to normalize my feature-table using a compositional approach such as centered-log transformation. While reading about this on web, I came across something called "Total Sum Scaling normalisation". I have several questions about this

Is "Total Sum Scaling normalisation" similar to what happens when performing qiime "feature-table relative-frequency"?
"Total Sum Scaling normalisation" is described as the variable read count divided by the total number of read counts in each individual sample.
If "Total Sum Scaling normalisation=feature-table relative-frequency", should centered log transformation be done after performing this?

My intention is to use this normalized table to produce a network by using igraph in R.

Thanks in advance.
Uth

Mehrbod_Estaki · January 14, 2020, 4:04am

Hi @uth,
TSS is different than relative frequency, the latter in qiime2 is just creating a relative abundance table while the former is a type of normalization deployed by some differential abundance tools (maybe useful elsewhere too actually) to reduce bias resulting from uneven sampling depth. Qiime2 currently ~~does~~ doesn't have a standalone CLR transform plugin, but several plugins in Qiime2 do use CLR transformation ex, q2-ancom, q2-gneiss (or its successor q2-songbird), and other plugins such as q2-aldex2.
If you are going to be going into R anyways I would just use a CLR transform there, there are lots of those available in various packages.
You shouldn't use TSS transform and CLR, just use CLR if you're going to be using compositional-focused tools.

uth · January 14, 2020, 5:03am

Thank you so much for the prompt reply. It's much more clear now.

I still have a small question. In a paper I read, "The abundance table was
normalized by using the summed read count per sample". Do you have any idea what this normalization technique means? Any idea is much appreciated.

Moreover, I'd like to do all the pre-processing in qiime2, therefore If I need to normalize my feature-table using CLR in qiime plugins, would you be able to let me know what exactly should be done? Sorry for bothering! I'm a little lost with these normalization techniques.

Thank you so much again!!!

Uth

Mehrbod_Estaki · January 14, 2020, 8:05am

Hi @uth,
That description is not very well...descriptive, maybe a reference in the paper can shed some light on that? Sorry! More importantly though, as I mentioned you don't need to do this normalization if you are planning on using CLR-based tools.

I made a crucial typo in my original answer (edited above). There is NOT a stand-alone tool to just do CLR transforms in Qiime2. They are embedded within other plugins I listed above. So you are unfortunately stuck with doing this in R.
That being said, of possible interest for you is the plugin q2-scnic which makes correlation networks specific to compositional data. Not sure if its exactly what you want, but worth checking out perhaps.

uth · January 14, 2020, 11:33pm

Thank you so much for the suggestions. I think q2-scnic does what I want because my intention was to use the feature table to construct a microbial network using SparCC. Having said that, I wonder again, if normalization of the feature table is needed before using q2-scnic plugin?

Moreover, my feature table has both bacteria and eukaryotes and my goal is to see the correlation between these bacteria and eukaryotes (not between bacteria and bacteria) using a co-occurence network. Is this possible in q2-scnic?

The reference paper which makes the statement “The abundance table was
normalized by using the summed read count per sample” can be found from this link https://science.sciencemag.org/content/348/6237/1262073.long ( Title: Ocean plankton. Determinants of community structure in the global plankton interactome).

Thanks again!

Uth

Mehrbod_Estaki · January 15, 2020, 12:02am

Hi @uth,
Glad you found that useful.

SCNIC will deal with compositionality nature of your data, so you don't need to do any normalization before hand. Converting the data to relative-abundances is a type of normalization on its own, and then the CLR transform is applied to relative abundance data. Again, you need to simply provide your feature-table (following the SCNIC tutorial I linked)

I'm not sure to be honest. It kind of depends on your data. Are these 2 separate feature tables you merged together, or are they linked as in shotgun data where everything is sequenced together in one run? It would be better if you started a new thread specific to your question with SCNIC as this is starting to drift away from the original inquiry of this post. We would be happy to point you to the right places and people there.

I had a quick look through the 2 references in that paper you linked and unfortunately neither of them proved to be any more useful than the original, as often these things go. Seems like they are simple relative abundance data without any normalization, but I could be reading it wrong. Either way though, as mentioned before, you don't need to worry about that here. The benefit of using these new tools is that they don't require normalization, at least not in the sense you are looking for.
Hope this helps!

Mehrbod_Estaki · January 15, 2020, 1:06am

A post was split to a new topic: q2-SCNIC installation error

system · February 15, 2020, 7:06am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.