Greetings Qiime2 team,
![]()
I am a new user of qiime2, analyzing ITS amplicons in the q2-boots-amplicon-2025.7 conda environment. I am not experienced in working with alpha and beta diversity measures.
I hope that General Discussion is the right category to post this question about alpha diversity. Please let me know if otherwise.
I believe I understand the basic interpretation of the two types of graphs that are the outputs of the alpha-rarefaction action:
The top graph shows that a read depth of around 50,000 per sample was sufficient to capture the total Shannon diversity present in each sample.
The bottom graph shows that for this metadata column, "Season", a sampling depth of 50,000 would include all of my samples in order to calculate the Shannon index. However, a sampling depth of 275,000 would exclude a few samples from each category (losing 2 samples from the Wet Season, and 5 samples from the Dry Season categories).
My alpha diversity measures were calculated using kmer-diversity action with a sampling depth of 275,000, like this:
qiime boots kmer-diversity \
--i-table table.qza \
--i-sequences asvs.qza \
--m-metadata-file metadata.tsv \
--p-sampling-depth 275000 \
--p-n 10 \
--p-replacement \
--p-alpha-average-method median \
--p-beta-average-method medoid \
--p-alpha-metrics pielou_e \
--p-alpha-metrics observed_features \
--p-alpha-metrics shannon \
--p-beta-metrics aitchison \
--p-beta-metrics jaccard \
--output-dir path/
I get that the sampling depth chosen for kmer-diversity is independent of the sampling depth chosen for alpha-rarefaction. I chose a sampling depth of 325,000 for performing alpha-rarefaction, like this:
qiime diversity alpha-rarefaction \
--i-table table.qza \
--p-max-depth 325,000 \
--m-metadata-file metadata.tsv \
--o-visualization path/
If I were to do statistical significance testing of alpha measures between Wet and Dry season for this kmer-diversity sampling depth of 275,000, I can be confident that any difference or not-difference detected in Shannon measure between these two groups will be based on most of my samples, and should therefore be representative of my library as a whole.
But, what if that were not the case for a different metadata category; say Male vs Female? At a sampling depth of 50,000 it includes most of my samples, but for 275,000, all but one Male sample drops out. Is the recommended practice to then re-run kmer-diversity with the sampling depth of 50,000? Or are we recommended to stick to the higher sampling depth, and just say insufficient sample size for M v F?
My confusion is about this part (from tutorials): "When grouping samples by metadata, it is therefore essential to look at the bottom plot to ensure that the data presented in the top plot is reliable." I understand how to use the alpha-rarefaction plots to identify that sweet-spot of maximizing number of samples and max measured diversity. But, I'm confused about whether /how to implement that information so that our final alpha and beta diversity measures are based on that ideal sampling depth.
My questions are:
- Since my sampling depth for
kmer-diversitywas 275,000, does that mean the alpha diversity measures that were outputted fromkmer-diversityare excluding some of my wet and dry samples, as described above? - Does it also mean that I can/should rerun
kmer-diversitywith a sampling depth of 50,000, so that the new alpha diversity measures would include all of my samples in the Wet and Dry categories? - From the moving pictures tutorial, it seems another use of the alpha-rarefaction plot is in choosing sampling depths for
core-metrics-phylogeneticactions (e.g. faith-pd alpha measure), which is requiring phylogenetic trees. There's not currently a good method to make trees with ITS seq of uneven lengths, so instead, I would use the result of alpha-rarefaction as input tokmer-diversity, a non-tree method of measuring diversity for ITS as described in question 2 above. Am I understanding the uses of the alpha-rarefaction curves correctly for tree and kmer approaches; e.g. the output of rarefaction can be used to determine input sampling depths of diversity measures? - Tutorials such as gut-to-soil and moving pictures have the diversity measures calculated first, and then
alpha-rarefactionis run next. But, alpha-rarefaction does not requirekmer-diversityoutput to run, for example. So, why not runalpha-rarefactionfirst, so it can be used to choose a sampling depth forkmer-diversity?
The fact that rarefaction is not run first in the tutorials to determine sampling depth for measuring diversity, is really making me question whether I am understanding any of this correctly.
Thanks very much for any info or suggestions you could provide! ![]()
