I’m running QIIME 2 v2024.10 installed via Conda and am seeing unexpectedly high Shannon diversity values in my human 16S data. I would like guidance on whether this is a technical issue in QIIME 2.
Steps I took:
Ran qiime diversity alpha-rarefaction to choose an appropriate sequencing depth.
The Shannon index from QIIME 2 is higher than expected (range 6–9).
When I export the feature table and calculate Shannon diversity in R, values are as expected (below 6).
I tried relative abundance normalization in QIIME 2, but Shannon values remain high.
I would like to understand if this is expected behavior, a technical issue with my QIIME 2 commands, or a misunderstanding in the rarefaction process. Ideally, I want to calculate Shannon and other alpha/beta diversity metrics in QIIME 2 while keeping results consistent with what is observed outside QIIME.
System information:
QIIME 2 version: 2024.10
Installation: Conda
I’ve searched the forum and reviewed the QIIME 2 glossary, but haven’t found similar reports. Any advice on troubleshooting this or recommended approaches would be greatly appreciated.
Hi @Sheylle_Green,
I'm following up on @colinbrislawn's as we had a little internal discussion about this. We think that the reason you're seeing this difference between implementations of Shannon Diversity across different tools is indeed the difference in the base used in the calculation. We did not change this in QIIME 2 recently though, so results should be consistent across QIIME 2 versions. (We are planning a change, which is where the confusion came from - but that has not yet been implemented.)
In brief, the exact Shannon values may differ across tools, but the results should be highly correlated. There isn't a single correct choice for the base, so as long as you're comparing values computed with the same base, your comparisons are valid. For the moment, this means avoid comparing values generated with different tools (aside from confirming that they are correlated, if you'd like to do that).
Thank you very much for your helpful responses and for clarifying the issue regarding the log base differences in Shannon diversity calculations across tools — that makes sense.
I’ve reached a point where I’m unsure how to move forward. The researcher I’m working with prefers the lower Shannon values calculated in R, as these seem to better reflect their expectations for the dataset.
The challenge is that I also need to use all other alpha and beta diversity metrics (e.g., Faith’s PD, UniFrac) generated through the QIIME 2 pipeline to ensure consistency and reproducibility for the rest of the analysis. From my understanding (and as Greg mentioned), it wouldn’t be appropriate or academically correct to mix Shannon values generated in another tool with beta diversity metrics generated in QIIME 2.
Would anyone have suggestions on how to handle this situation?
Is there a recommended way to manually adjust the Shannon values calculated in QIIME 2 to match those from R (e.g., by applying a log base conversion)?
Or alternatively, is there a sensible, reproducible approach for calculating all alpha and beta diversity metrics outside of QIIME 2 if the Shannon values must come from R?
I’d be grateful for any advice or examples of how others have dealt with similar scenarios.
That’s fine — I think this reflect’s @gregcaporaso ‘s advice as well that
As your collaborator and others in the field are more familiar with values generated with a different log base, it makes sense to keep this consistent when publishing in that field so that the results are not misinterpreted (as it is easy for readers to overlook the differences caused by log base differences).
No, I don’t think that @gregcaporaso meant that you should only exclusively use QIIME 2 (but correct me if I am wrong Greg ). Rather, it would be best not to compare Shannon value generated with different tools.
So I think it should be fine to calculate Shannon with another tool of your choice if that is what you prefer. HOWEVER, the main issue that could occur is if this is handling the data in a different way from QIIME 2 (e.g., rarefaction, rarefying, or bootstrapping in a different way), then the methods reporting gets a bit murky.
For this reason, what you propose is probably the most transparent and straightforward: to apply a log base conversion prior to plotting.
We are discussing exposing a Shannon log base parameter in QIIME 2 so that it can be adjusted by the user. So we might manage to add this very soon (the next release comes out late this month), though this might need to wait until the next release. You can track that issue here:
That all aligns with my thoughts - thanks for the help @Nicholas_Bokulich!
Good luck @Sheylle_Green, and thanks for getting in touch about this! As @Nicholas_Bokulich mentioned, we should have a better solution soon, though maybe not in time for you this time around. Next time though, if not!