Alpha/beta diversity: filtering datasets, additional metadata post analysis and missing metadata values

Nicholas_Bokulich · August 30, 2018, 6:08pm

I prefer the additional metadata column — it will be more informative about overall differences within and between lakes and size fractions. beta-group-significance will perform pairwise permanova tests, so this will still tell you about whether individual fractions in individual lakes are different from each other, etc...

filtering and re-analyzing will be more useful if, e.g., you do the total analysis and see differences between some groups but your PCoA plots are a nasty tangled ball... then you could filter and re-run with subsets for ease of visualization.

beta diversity results will change any time the input samples change. Alpha diversity will not. So at the very least run alpha diversity on everything.

you could but I would discourage it. It gets rather messy for reporting purposes and would be misleading in publication if you are reporting different diversity results (especially alpha diversity) with different rarefaction levels.

No, metadata can be a superset.

yes. As far as I know, the metadata file is only used for labeling samples in the emperor PCoA plots that are produced by core-metrics. It is not used in any way during rarefaction or diversity estimation. So your diversity results will always remain the same.

When you update your metadata file you can recreate those emperor plots by using the output pcoa results files from core-metrics (e.g., bray_curtis_pcoa_results.qza) and using emperor plot to build a new PCoA plot with the new metadata file.

That will matter at the statistical testing stage, not at the diversity estimation stage, so it does not matter for running core-metrics

For the most part, missing values are ignored, but it really depends on what plugin/method you are using. alpha-group-significance the most relevant one for you, will ignore missing values.

I hope that helps!