Difficulty to interpret output Alpha diversity

Liviacmg · July 5, 2023, 6:45am

Hey guys,

I am new to qiime2 and I need help to understand my alpha diversity output, and also I am having errors while trying to calculate beta diversity...

I am using qiime 2020.8 through conda.

These are my multiqc results from fastqc:

Forward:

Reverse:

I used dada2 and these parameters to cut:
--p-trim-left-f 12
--p-trim-left-r 13
--p-trunc-len-f 230
--p-trunc-len-r 220 \

The p-sampling-depth used was 0, given the table.qzv:

Beginning of the table

End of the table

The evenness group had these messages:

And only the column loaddate was kept...

The same happened to faith-pd:

Even when I tried to adjust the metadata and validate it the same messages appeared. I don't know how to fix it so the other columns would also appear...

Why there are multiple samples with just a dash instead of a boxplot? Because there aren't enough samples to be analyzed?

The weighted unifrac:
Being red the sick patients and blue the controls:

The unweighted unifrac:

The bray-curtis:

jwdebelius · July 5, 2023, 1:31pm

Hi @Liviacmg,

I think the core issue around you error messages is your group sizes. Essentially, the alpha group significance function is warning you that you are asking it to do something statistically impossible.

There are 2 absolute limitations you're grappling with here:

The kruskal wallis test used in this visualization cannot perform when there is only one group in the data. There are some variants that allow it, but this is not one for them.
The kruskal wallis test cannot be performed when there are one sample or fewer per group. That's why you can't test continous varibales.

There's a related limitation, which is that the KW test doesn't really work with fewer than 5 samples per group, because that's just the way the test functions.

So, to answer your questions

Per the first two warnings, the other columns shouldn't appear. The first list of columns are named in your metadata as continous variables. The KW test does not support continous variables. You might consider alpha diversity correlation in q2-diversity or anova in q2-longitudinal, if you think you data meets the normality assumptions. (Evenness is bounded between 0 and 1 so it probably does not.)

The second warning says that the categories in those columns are all unique, or all one value or entirely missing. A lot of those column headers (LibrarySelection, LibrarySource, etc) look like they should be common across all samples. You can't do a within column comparison if you have no groups. There's no real solution to this, but it's also probably not relevant to your data.

Exactly, this indicates that you have 1 sample in the group, which is a hint that maybe this is not the best category for statistical testing.

You might consider the following:

If your data is cross sectional, maybe you want to ignore dates and focus on your variable of interest (disease status?). You can use that drop down under "column" to get there. Maybe there are other confounders.
If your data is longitudinal, you might want to look into the q2-longitudinal plugin.

Best,
Justine

Liviacmg · July 5, 2023, 3:35pm

Thank you so much, Justine!!

Actually, I am using public data of anorexia nervosa and in the original paper they managed to do non-parametric multivariate analysis of variance (perMANOVA; p <0.05 for all levels) and redundancy analysis (RDA) (Microbiota in anorexia nervosa: The triangle between bacterial species, metabolites and psychological tests - PMC).

So, how could they do it (if the samples apparently don't show a clear division between two variables)?
Can I be able to replicate this result and do the permanova as well?

Also, what kinds of statistical tests would you recommend in this case (with fewer samples)? Can't I analyze alpha and beta diversity?

And if I tried to use the metadata that they make available in the article, should it change something?

jwdebelius · July 6, 2023, 12:22am

Hi @Liviacmg,

You're still more familiar with your research question and analysis than I am The decisions about what variables you test and what variables you skip, and how to group them are questions about knowing your data.

Permanova is a test for beta diversity with is in qiime2 as either beta-group-significance or adonis in the q2-diversity plugin. RDA is a modified beta PCA approach; I dont think there's an implementation here that wraps it specifically. So, you can try testing with the permanova. Just because you don't see "clear" clusters in PCoA space (or any ordination space) doesn't preclude seperation. It's a good visual check, but not a guarantee. The box plots in beta-group-significance can be a helpful guide, though.

I would recommend considering whether a statistical test is appropriate whether than whether you can find a statistical test that you can coerce your data to comply with. Depending on the age of the article, best practices have gotten somewhat more conservative. So, a re-analysis may not give you exactly what they started with.

Best,
Justine

Liviacmg · July 10, 2023, 11:12pm

Hi @jwdebelius ,

Thank you again!! So, analyzing the paper the two variables I am interested to analyze are BMI and Insulin. But, still, I'm pretty confused about aaall the metrics available.

Given your insight about krustal-wallis, it is not the right one for the analysis here; so which one could be best? Although, in the original paper they report that alpha analysis was not statistically significant, so it gives me a clue that I should simply give up on this one and focus on beta analysis.

Regarding beta analysis, given all these metrics (and including jaccard, which I uploaded down below) - all the images I sent here are an output from the command I used down below: unweighted, weighted, bryan curtis and jaccard -, how would I know what all these graphics mean? How could I interpret them, given that for example weighted-unifrac and bray-curtis, which are quantitative measures show totally different distributions, and unweighted frac and jaccard, which are qualitative, also show different distribution? And how to know these statistical metrics are the best options for my dataset if I don't know all of the options available?

The samples from the dataset were collected from January 2016 to June 2016, so is it considered longitudinal, or for such short amount of time is it cross-sectional/transversal? Should I start the selection of statistical metrics from there?

Sorry about so many questions...

qiime diversity core-metrics-phylogenetic
--i-phylogeny rooted-tree.qza
--i-table table-dada2.qza
--p-sampling-depth 1164
--m-metadata-file sample-metadata.tsv
--output-dir core-metrics-results:

jwdebelius · July 11, 2023, 4:54pm

Hi @Liviacmg,

As a reminder, One of the items in our code of conduct asks you to be responsible for your own work. The level of support you're currently asking for is on par with what I typically contribute to projects where I'm a co-author.

No one else here knows your data. Things like asking whether it's a time series is something you should know, either by reading the original paper or by discussing the experimental design with the people who designed the experiment, if you've just started on a project.
Knowing your data and your study is one of your responsibilities as an analyst.
Interpreting your results is one of your responsibilities as an analyst.

If you're bogged down in how to start looking, I'd start with the original paper with the caveat that if its published pre ~2015, your differential abundance methods may not be statistically correct.

A classical statistician can be a good resource for alpha diversity, since alpha diversity is well behaved for microbiome data and meets the independent identically distributed assumptions behind most statistical tests. The key issue I noticed for your dates was that you had fewer than 5 samples per group. (In some cases, you had 1 sample per group.) Its not that kruskal-wallis is always inappropriate, it's that in your case, you're violating one of the key assumptions.

I'll recommend the qiime2 videos as a starting place for beta diversity metrics. I dont remember if they go through them fully. You seem to haev an initial grasp of the differences between qualitative and quantitative metrics, maybe the video will help you identify an aspect you're missing. The qiime diveristy beta and qiime diversity beta-phylogenetic both provide a full list of metrics and there's a post somewhere here that describes the metrics, I think its one of the pinned posts.

Best,
Justine

Liviacmg · July 12, 2023, 12:41am

Hi @jwdebelius ,

Sorry for crossing the line! You are right!! Thank you for all insights, it already helped me so much!! I'll look into the videos and post you recommended.

Greetings,
Lívia

jwdebelius · July 12, 2023, 1:54pm

Hi @Liviacmg,

Thanks for understanding and being respectful!

If you've got general questions (i.e. I'm struggling to understnad the difference between unweighted and weighted UniFrac distance or why didnt pairwise testing show up) we're totally happy to help! The search function is also pretty good. The forum has been active since before 2018 and there's a lot of random content floating around. The best-of-the-forum and FAQs can also be good resources.

Best.
Justine

system · August 12, 2023, 7:54pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.