what is the mean of the point below the CK? Is Ck significantly different from the other two groups?

Kruskal-wallis's (all groups) P value =0.018 <0.05, does that mean this group factor has a significant effect on the results? Or this treatment has significant difference with both other two treatment?

I used anova before. But, I know anova need normalized data, and Kruskal-wallis dont need. If I used Kruskal-wallis to compare the difference of alpha diversity, how do I label it in a,b and c( different letters indicate significant differences)?

The point highlighted is an outlier. I’m not sure if most programs that generate boxplots like this one use the same defaults for what values define an outlier (e.x. say, a value that is 1.5 > IQR), but I can confirm that most programs use these points to describe an outlier value (note, this value can be above or below your boxplot, not just below). Here’s a thread talking more about these types of figures that might help.
The boxplots do not tell you explicitly whether one group is different from any other, so no, you can’t infer that Ck is significantly different from the other two groups using this boxplot information alone. You need to run other tests to determine group differences, like what you did next with the Kruskal-Wallis test (or other tests).

This significant KW test indicates there is evidence of some difference in ranked mean alpha diversity values between the three groups (CK, CF, OF). The KW test doesn’t tell you which groups are different from each other, rather, it just states there is at least one group that is different from another group. To identify the particular group you’d need to follow up with some pairwise test comparing each group; I stick with a Dunn’s test following a KW test (see here for a bit of background). However, I’d be wary of drawing large conclusions with these relatively small numbers of samples per group. For more of a background in the applications of the KW test, see this overview.

You need to have normalized your data in some way for any alpha diversity measure that incorporates abundance (of sequence counts) information in its calculation. You don’t technically need to do this for binary/presence-absence metrics like “observed richness”, but if you’re going to go to think about your data in both a binary and continuous way (say, using Faith’s PD and Shannon’s entropy), I find it to be more fair of both datasets are normalized so the inputs are identical before the calculation of diversity is assessed.
The point here is that it’s not correct to think that an anova requires noramlization and KW does not, rather, it’s that certain alpha diversity tests require normalization, and others do not. How you subsequently compare those per-sample alpha values (by anova or KW or otherwise) has nothing directly to do with the noramlization of the data.
Providing lettered group labels for pairwise differences is something that you get not from the KW test itself (or anova) - it’s what you get after you run a post hoc test like Dunn’s.

Thanks for your reply.
On your third point,I used the rarefied data to calculate the alpha diverstity, it is a way to normalized data ,right?
But, I runned SPSS to calculate the anova before. It do compare the difference within three groups. And qiime2 KW also did the pairwise differences ,right ? why dont use it directly?
like the graph I posted, first qiime2 compare the difference in all three groups: p<0.05, It means one group is different from any other; then compare the difference between any two group: there are signifancat difference between CK and CF, and CK and OF，but there isnt a significance between CF and OF. So,can I label the a,b,and bc on CK, CFand OF?

Data normalization (be it rarefying or another technique) is a separate process from estimating the effective number of species in a sample (alpha diversity values).

There is not one normalization technique, and different techniques are required depending on what you’re trying to do:

If you’re following along a classic tutorial, it’s likely that you first rarefied your data, then calculated alpha diversity values. That’s one way to go about normalization, though there are many others. See the recent SRS tool, for example. I would suggest that your linear model uses the same normalized data (in SPSS) if you’re comparing to your KW result.

Minor clarification: my earlier response assumed QIIME 2 alpha group significance was running one of a range of possible post hoc tests, but it appears it’s running KW for each group separately? I defer to the developers and users of the forum for the appropriateness of such an action, but would encourage you to apply a Dunn’s test to see if your pairwise comparisons match what is provided with the QIIME approach. In particular, any pairwise test needs to be mindful of multiple test corrections, which a post hoc test like Dunn’s applies (usually automatically). With just 3 groups, you have just 3 comparisons, so it might not matter. But if you have, say, 7 groups that’s k(k-1)/2 21 tests…

The labels aren’t incorrect insofar as they reflect your observed pairwise pvalues: the question is whether those pvalues are derived by the appropriate pairwise test, and whether they should be adjusted for multiple testing.

Thanks Sir .Why can’t I just use Dunn’s directly rather than after KW? or directly used anova to compare three groups? I just want know the differences between three group.

some alpha alpha diversity (eg. simpspon,shannon,Faith PD) should normalized data,some ( observe OTU and chao 1) should not? is right. what about beta diversity?

Dunn’s is used directly. As a post hoc test, the intention is to use it only after your global test for any group difference is first detected. So you run KW first, then, if significant, run the post hoc to identify which particular groups are driving that group difference.

I generally normalize my data table first, then perform all alpha/beta calculations on normalized data, regardless of whether the particular test uses counts or
a binary transformed value.

sorry， I’m not sure what you mean.
And I also want know. If the data is normal distribution,I should select ANOVA, if it is not, I should select KW,isnt it?

Now, I have 4 groups samples, and each has 3 subgroups. And I want compare 3 subgroups difference in a group, so run the Test of Homogeneity of Variances in SPSS, I find a group isnt normal normal distribution but other 3 group is . So, do I choose KW for the sake of uniformity? or 3 groups use ANOVA ,and the rest one use KW?

Thanks for the clarification @jwdebelius. My impression from this line of the script is that the corrected p-value is hardcoded using the Benjamini-Hochberg method? Interesting that there isn’t an option to adjust the various false discovery test, given all the possibilities.

Circling back to sample sizes - the package used to calculate KW statistic here suggests a minimum of five samples per group. Hopefully we’ve addressed the original questions in this post, clarifying what a boxplot outlier is, and explaining where the statistical outputs are derived.

there is always a balance between simplifying the interface and providing the options that users really want/need. Users have not asked to have other FDR-correction options exposed, so in this case I think simplicity won out. But nothing is permanent, and with a couple lines of code and a pull request, your wish could be granted…

Agreed. @YuZhangif you have other questions, please open a separate topic. But for the sake of closing this topic, here you go:

Sure. But KW is also fine for non-normal data. q2-longitudinal has an ANOVA method that you can use if your data are normally distributed.