meta-analysis alpha diversity

Hello everyone, i want to perform alpha diversity meta-analysis in 5 datasets i use the pipeline of qiime2 and until the dada2 step each dataset denoised separately.

I have some questions about normalization steps.

  1. For normalization step: can i normalized the data into relative abundance instead rarefaction?

  2. The normalization step must be in a merged table or i must normalized the each dataset separately?

If anyone knows, it would be very useful to me!

Thank you all for your support.

Hi @iordanis

I would look at the pros and cons of both normalisation approaches and which one suits your data best.

While RA is more comparable between samples, you might lose out on richness-focused analyses which require absolute counts as opposed to proportions as is the case in RA.

Rarefaction on the other hand, because it subsamples to a uniform depth, it allows direct comparisons during analyses.

I would merge all individual datasets into a single table so that each of your 5 datasets are treated uniformly.

You can do this in Qiime using the following command (for your 5 tables):

Merge Tables
qiime feature-table merge \
  --i-tables table1.qza \
  --i-tables table2.qza \
  --i-tables table3.qza \
  --i-tables table4.qza \
  --i-tables table5.qza \
  --o-merged-table merged_table.qza

I hope that helps!

2 Likes

Hi @Mike_Stevenson

Thank you for response.

After qiime2 pipeline i want move the file into R and calculate the alpha diversity meta-analysis with metafor.

I was wondering if each better to calculate the rarefaction or relative abundance data from separate datasets and then extract the summarize statistics for alpha diversity and then use random or fixed effect models.

That's why i ask what is better.

However you believe still is better the merged tables and after normalization to split the files and put it into R for calculation alpha diversity Shannon and then extract the summaries statistics?

Best
Jordan

Hi @iordanis

I've looked more closely at the pros and cons between these two normalisation methods and it is very dependent upon your samples; for example, if you went with rarefaction, a con would be that you could potentially lose information in highly diverse samples , whereas a con in relative abundance is that sequencing depth differences can influence relative abundances and therefore introduce bias in diversity estimates.

If your samples have different sequencing depths, I would go with rarefaction to allow for a direct comparison. If they have high (and similar) sequencing depths, maybe relative abundance is preferable as you can withhold more of the data/information in your samples.

I am not sure if there is a definitive answer for this query.

You could always try both normalisation methods; for each, use the merged tables after denoising, calculate Shannon Index, and then move into Metafor package in R for your meta-analyses.

I hope this helps in some way!

3 Likes

Hi @iordanis,

I think maybe there's some confusion and want to clarify some points.

I agree with @Mike_Stevenson: my best recommendation (and I think in fact hte current recommendation in the field, is that if you're going to use standard metrics, you should perform rarefaction or some kind of depth-based normalization. This is especially true for richness. If you follow the citations on Weiss et al, they do a nice job of addressing the issues; the short summary being that you need to have a consistent depth to be comparable.

Any specific reason to go to R for the diversity calculation? Or is it just convenience? Also, because I enjoy being challenging, if you're processing consistently, why not run a mixed effects model and get more percision for your estimate? Statistical meta analyses are great when you only have summary statistics available, but if you have primary data (like you do), an LME (lmer, sadly not QIIME yet) or a GEE will give you more precision without sacrificing accuracy.

Best,
Justine

2 Likes

Hi @jwdebelius, @Mike_Stevenson

Thank you both for Valuable help.

Well i have read some papers that contact meta-analysis for alpha diversity.

For alpha diversity one paper mention this::

DOI 10.3389/fcimb.2023.1119875

The vegan package was utilized to normalize the feature table to scale based on each sample’s library size that transformed the feature table into a relative feature table, aiming to remove technical bias caused by variations in sample collection, library preparation, or sequencing manifesting as uneven sampling depth and sparsity, which could not reflect the true difference in the underlying biology (Weiss et al., 2017).

From this i was believe that use TSS normalization which are the relative abundance.
But i am not familiar with statistic field, so it is hard for me to understand the paper perfectly.

And then calculate Summary statistics and use random model. But they do not mention if the use merge tables or separately tables.

again from here:
https://doi.org/10.1038/s41467-017-01973-8

They use relative abundance data for alpha diversity one each table.

Furthermore, in some papers meta analysis the use the report Summary statistics for alpha diversity indexes and go direct for meta analysis. All-thought, they use Summary statistics from different sampling depths from each dataset and still go ahead.

That's why i wander, maybe there is no big different to do the same for my raw data.

@jwdebelius did you know any paper with more information for LME (lmer) or can find it in metafor package documentation?

Thank you very much both of you for yours try to help me!

Best
Jordan

Hi @iordanis,

So, your first link is miss-citing the Weiss 2017 paper which pretty explicitly says, "rarefy your alpha diversity, becuase that's what you do with alpha diversity". I'd argue both are failures in peer review: diversity is tied to sequencing depth. But, rather than quibble about it, I'd recommend you construct a rarefaction curve (alpha-rarefaction) and interogate your data yourself.

So, this draws more on individual participant meta-analsysis and some as yet unpublished work we've been doing in my group. Essentially, a meta-analysis based on summary statistics sacrifices information and tends to have higher standard errors for the pooled estimate than alternatives. I dont have enough of a math background to explain it, but its conservative. A model that considers the study as a random effect (LME, GEE) gives you that study specific effect (even lets you have a study-specific slope if needed) and provides more accuracy. I'll recommend literature on individual patient data meta-analysis as a reference.

Best,
Justine

3 Likes

Hi @jwdebelius,

Perfect, it is clear to me now i will go ahead for rarefaction curve!!!

Something last, i will create alpha curve on each dataset separately because i don't want loss to much information for just a exactly common library size in all datasets.
Plush: 1) i have read that Shannon index it is less sensitive to different library size. 2) i will use batch effect correct.

After all, i believe this approach is good enough! for a starting point.

Again thank you for explanations, and for support!!!

Best,
Jordan

3 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.