shannon index variation and interpretation

Sue · April 28, 2025, 2:01pm

I'm really sorry to bother you again, but I have some concerns. I want to revisit my two posts:
shannon-entropy and low non chimeric reads
When I use the parameter --p-min-fold-parent-over-abundance to increase non-chimeric read counts, the Shannon index increases as well. I believe this is normal, because raising the threshold for considering a sequence to be chimeric inherently increases the number of reads retained. However, I’m concerned about whether adjusting --p-min-fold-parent-over-abundance is acceptable from a bioinformatics perspective, especially since I intend to publish these results in my first paper. What arguments can I use to justify this approach to reviewers if they question it? For example, is pointing out that my initial non-chimeric reads are around 10%, and that they increase significantly (for instance, with values of 8 or 10 or 16 for --p-min-fold-parent-over-abundance) enough of a rationale?

A second question: given that the Shannon index will rise after changing this parameter, should I avoid using it to compare diversity across different studies? Or is it acceptable to fix a single value (such as --p-min-fold-parent-over-abundance 8) and apply it uniformly to all my datasets (which come from different countries, regions, and times of the year), and then compare diversity based on the Shannon index derived from that single parameter setting?

Alternatively, should I avoid comparing studies from different regions and limit my comparisons to samples within the same study? ( In some datasets I leave --p-min-fold-parent-over-abundance at its default and obtain Shannon indices between 2 and 5, whereas in other datasets I set --p-min-fold-parent-over-abundance to 8 and obtain Shannon indices between 6 and 8). In this case, should I refrain from comparing these studies, or can I compare them?
thank you for you time !!

colinbrislawn · April 28, 2025, 5:17pm

Sure, I think that makes sense! I suspect the criticism will be that this extra diversity is just from extra chimera getting through the more relaxed filter. Let's see what the refs say...

I think this is okay. The counter-example is if you use different settings for different cohorts and introduce a batch effect.

Also wise. The challenge of using multiple regions has been covered on the forums, too, if you want to seek out those discussions!

Zooming out a little, this is why I try to stick to default settings, as reviewer three will ask/complain about changes. The Qiime2 plugin developer takes care to select reasonable defaults, so unless shown otherwise, these defaults are generally definable and they are easy for me to use.

Sue · April 29, 2025, 7:25am

Thank you very much for taking the time to respond.

Just a small clarification: when I said “different regions,” I meant that the samples come from all over the world, but they all target the V3–V4 region.

As I understand it, in this case I should choose the optimal value for --p-min-fold-parent-over-abundance (either 8 or 10) and then rerun all my analyses using that same value to minimize batch effects. This way, I can justify to the reviewer that I applied an identical parameter across all studies.

system · May 31, 2025, 6:43pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.