alpha group significance: difference boxplots for difference runs using the same input file

We are running the qiime diversity alpha-group-significance command for Faith's PD in a course and we observed different results from different students while using the same input file. I am attaching a couple of files with the post as examples. We would be greateful for possible explanations for these differences. Thank you very much!

FaithPD_differences.zip (11.0 KB)

Hi @meghna_swayambhu,
Did these students use the same sampling depth for core-metrics-phylogenic?
Having different sampling depth could be the reason for these differences.

However, even if the students had the exact same sampling depth, there is still a pretty high chance that they get slightly different results. Setting the sampling depth is choosing what number of sequences are going to get randomly selected per sample. So all these students could have a slightly different set of sequences that makes diversity metrics look different! We hope that this is a representative sample but sometimes it isn't, especially if the sampling depth was set before the diversity metrics even out.

I would recommend having the students look at their alpha rarefaction plots and see if their sampling depth was set to a reasonable depth. If the alpha-rarefaction shows that the alpha diversity is not stable at their given sampling depth, you might see more differences between the students results!

Hope that helps!
:turtle:

1 Like

Hi @cherman2,
Thank you very much for the detailed explanation. The students used the same sampling depth of 25600, based on the lowest number filtered sequences in a sample and it looks like the samples are rarefied at this depth. Would you perhaps recommend another value? I am attaching both files. In addition, is there perhaps a way to run the core-metric-phylogenetic command without the sampling depth parameter and use all reads from all samples?

Thank you very much!
Meghna

alpha-rarefaction-plot.qzv (451.2 KB)
filtered-table.qzv (525.3 KB)

Hi @meghna_swayambhu,

Looking at your files, this is a very reasonable sampling depth that allows you to keep all your samples.

I would not recommend running diversity metrics without a sampling depth.

I like to think of it as if I were exploring the rainforest and the desert and counting how many species were in both. If I counted all the species in a 100 mile x 100 mile square in the desert and I went to the rainforest and counted all the species in a 1 mile x 1 mile square. I might find that the desert has more species in it than the rainforest. In this example, I probably only observed increased diversity in the desert because I investigated a way larger portion of desert than the portion of the rainforest I investigated.

Circling back to the microbiome, not choosing a sampling depth would be investigating your deepest sample at 88,592 sequences and your shallowest sample at 25,603. In this case you would be investigating one sample almost 4 times more than your lowest sample. Similarly to the rainforest/desert example, it wouldn't be that surprising to see more diversity in the sample you investigated more just because you had more to investigate.

If I went back to the rainforest and the desert but this time I measured a 1 mile x 1 mile square of both, I would see that the rainforest has way more species than the desert. (Which we know is more accurate than my last observation). I wouldn't be investigating the desert environment as much as I was previously, but I would be investigating both environments equally.

As for your data, you do see some variation in your alpha diversity plots but I would point out that the over all pattern that these plots show is not changing between plots. You results are pretty expected when you run the random sampling a multiple times, and it basically all comes down to the necessary evil of selecting a sampling depth.

Hope that helps!
:turtle:

1 Like

Hi @cherman2,
Thank you very much for the detailed explanantion and for confirming the sampling depth value. Regarding the question of difference in box plots for the Faith PD, it is true that the inference is pretty much the same. However, I was wondering if you would know why the rest of the measures like obeserved or shannon yield identical box plots and we only see a difference in the Faith's PD? Thank you very much for all your help with the clarifications!

Best,
Meghna

@meghna_swayambhu,
Can you send the observed features and shannons .qzvs so that I can look at them.
Thank you!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.