Normalization use

enricca · June 25, 2018, 11:38am

Why is it necessary to normalize by number of reads, if we use relative abundances? When we convert from absolute to relative abundances, aren't we performing some sort of normalization? (in fact, it is like performing MetagenomeSeq normalization with L=100%)

Nicholas_Bokulich · June 26, 2018, 12:06am

it is not necessary to do both — if you've read otherwise in our documentation, please tell us where so we can get that fixed

Absolutely.

That normalization is not adequate for some analyses (e.g., alpha diversity comparisons require normalization for uneven sequencing depth between samples — currently we recommend rarefying to correct for that)

Any analysis in QIIME 2 (at least for "core" plugins) that requires normalization has that normalization built in — e.g., rarefying is performed automatically for alpha and beta diversity analyses. Rarefied tables should not be used for downstream analyses, e.g., that involve converting to relative abundance.

Did you have a particular step in mind?

enricca · June 27, 2018, 11:50am

Thank you for your kind response. I haven't read that in your documentation, but reading papers (not from your research group).

What do you mean by sequencing depth? Do you mean coverage depth? Coverage = (total number of bases generated) / (size of genome sequenced)

Sorry about the terminology issue and thank you again

Nicholas_Bokulich · June 27, 2018, 12:31pm

sequencing depth = # of sequences / sample

(this is coming from an amplicon sequencing perspective where coverage depth is not an issue)

You can see some more discussion of sequencing depth and rarefying for alpha/beta diversity analyses here and in other places in that tutorial.

I hope that clarifies!

enricca · June 28, 2018, 2:26pm

That clarifies a lot, thank you.

In this case, I understand that if alpha diversity is computed just taking into account the species of bacteria found (not the abundances of each specie), we would have an important bias if we did not perform rarefaction.

However, if we compute alpha diversity taking into account abundances and using relative abundances, we should have much less bias and we could use much more data. I guess that it depends on what pays off depending on the data that we have.

Nicholas_Bokulich · June 28, 2018, 7:13pm

The number of species/unique sequences observed increases as sequencing depth increases; so we rarefy to standardize the sequencing depth, such that different sequencing depth does not bias diversity estimates.