Unifrac Significant Group over-inflating n's


When I run the beta-group-significance script it is over-inflating the number of n's that I have in a sample group. I know these numbers can't be right because my whole dataset only has 98 samples (I'm looking at a subset that only encompasses 50 samples, the day 14 samples).

This is the script I ran:

qiime diversity beta-group-signfiicance --i-distance-matrix no_D0_core_metrics/weighted_unifrac_distance_matrix.qza --m-metadata-file ~/Desktop/Madison_seq_1/Jackson/jackson_mapping.txt --m-metadata-category Diet_Treat --o-visualization no_D0_core_metrics/weighted_unifrac_diet_treat-sig-group.qzv --p-pairwise

I saw a previous post (that is now closed) and they were asked for the mapping file and the distance matrix so I'm including both here. I'm expecting 2920X_control and 2920X_lisinopril to have 8 samples each, 2920X_captopril to have 10 samples. HFD_captopril and HFD_lisinopril groups should have an n of 8, while HFD_control should have an n of 7 (one sample was removed in the sampling depth step). I'm seeing n's of 45 for 2920X_captopril, 70 for 29290X_control, 80 for 2920X_lisinopril, 80 for HFD_captopril, 70 for HFD_control, and 80 for HFD_lisinopril.

jackson_mapping.txt (9.3 KB)

weighted_unifrac_distance_matrix.qza (45.1 KB)

1 Like

Hey @saatkinson,

:heart: Thanks! I was able to re-run your command to look at the results.

For the figures, the n doesn’t represent the number of samples, but rather the number of pairwise comparisons that make up the distribution for the metric that the box-plot represents. For example, your 2920X_Control vs 2920X_Control has a sample-size of 7, and so the number of possible pairwise comparisons would be \frac{7 * (7-1)}{2} = 21 which matches what we see in the second figure. To state it another way, since metrics are pairwise comparisons, a distribution/boxplot of metric distances must be a distribution/boxplot of pairwise comparisons.

Also, at the bottom where the pairwise permanova results are, the sample size is the number of samples compared (so 7 + 7 for the same 2920X_Control vs 2920X_Control).

Let me know if that makes sense!


Hey @ebolyen,

That does make sense! Thanks so much!



This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.