How to calculate a simple Jaccard similarity coefficient for two populations?

I am new at this, so my apologies if I have overlooked what should be a simple answer and is already in the Forum… I have looked but cannot find whether there’s code to run just a simple Jaccard similarity coefficient between two populations:

S = a / (a + b + c)

where S = Jaccard similarity coefficient,
a = number of species in Sample A and Sample B (joint occurrences)
b = number of species in Sample B but not in Sample A
c = number of species in Sample A but not in Sample B

I understand and know of the Jaccard’s plot possible in Emperor plots, but I want something far more basic when I’m comparing only two populations (or species, or samples, or whatever).

Also, just a simple calculation of Shannon’s H for each sample/population? Is that possible?

Thank you for your time!

Best regards, Brian

Hi @l.brian.patrick,

Totally possible! Check out qiime diversity beta for jaccard distance and qiime diversity alpha for shannon.


1 Like

Okay, so I tried the following:

qiime diversity alpha --i-table bacteria-table.qza --p-metric shannon --o-alpha-diversity shannon_table

where bacteria-table.qza is my filtered table (removed eukaryotes, archaeans, and bacteria identified only to the Domain level), this should be analogous to the table.qza from the Moving Pictures tutorial.

I did generate shannon_table.qza, but I am unsure of the commands necessary to visualize the results! I know this should be easy, but I just cannot seem to get the correct syntax…

Also, is there a way to integrate my metadata file so that I can see the two groups?

Any help with the visualization is appreciated! Thank you!

Best regards, Brian

Hey there @l.brian.patrick!

Have you seen the alpha-group-significance visualizer?

The command I shared above requires metadata, so you should be all set!



That worked brilliantly! Thank you so very much!

1 Like

Okay, one last question (I think)-- is there a way to integrate sampling depth into these analyses? As I currently have the arguments structured, they are using a table.qza (specifically bacteria-table.qza, as specified in my earlier post) that hasn’t accounted for sampling depth. Thus, it appears that all of my samples are included in the analyses.

For Shannon’s I ran the following:

qiime diversity alpha --i-table bacteria-table.qza --p-metric shannon --o-alpha-diversity shannon_table

qiime diversity alpha-group-significance --i-alpha-diversity shannon_table.qza --m-metadata-file mdat.tsv --o-visualization shannon_table.qzv

For Jaccard’s similarity I ran the following:

qiime diversity beta --i-table bacteria-table.qza --p-metric jaccard --o-distance-matrix jaccard_table

qiime diversity beta-group-significance --i-distance-matrix jaccard_table.qza --m-metadata-file mdat.tsv --m-metadata-column State --o-visualization jaccard_table.qzv

where State is the categorical variable chosen from the metadata file.

I do run the following command to get some core metrics:

qiime diversity core-metrics-phylogenetic --i-phylogeny rooted-tree.qza --i-table bacteria-table.qza --p-sampling-depth #### --m-metadata-file mdat.tsv --output-dir core-metrics-results

where #### is determined using rarefaction.

Is there an output to the above core-metrics that I could use that would allow me to only use samples at or above the specified sampling depth?

I did find this article in the Forum, so I do have the basics of including the phylogeny into the Shannon’s and Jaccard, if that is the way to go:

Thank you so very much for your help and I appreciate your time helping the noob figure out some of these specifics!

Best regards, Brian

Wait, when I ran the qiime diversity core-metrics-phylogenetics and created the output directory I just noticed that there’s a shannon_vector.qza and a jaccard_distance_matrix.qza within that directory. Are these based off of the sampling depth I set in that same command??? If so, then I have answered my own question???

Pretty sure my face should be pretty red with embarrassment at the moment…

Best regards, Brian


Hi @l.brian.patrick,

Yep! Those are based off the sampling depth you set there.


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.