Is there a way/plugin to calculate the percentage of features present between samples?

Hey everyone, new to Qiime and microbiome studies here!

This seems to be a dumb question but I couldn’t find any answer anywhere so figured it might be worth posting to ask: if there a plugin/command that could calculate how many percentage of features in sample A are present in sample B? Basically I’m looking at a simple diversity metric that only takes presence/absence into account, for example how many % of an inoculum sticked in a gnotobiotic mouse.

I’ve looked into the moving-window tutorial as well as some community posts, for example this one: Alpha and Beta Diversity Explanations and Commands, but all the metrics seem to deal with more than what I need…

Thanks in advance for your help! (Or please point me to the right post if my searching skills have been terrible :X)

Hi @Focussash,

Welcome to the :qiime2: forum!

It sounds like you're looking for binary jaccard distance! It's 1 - shared/total organisms. So, it would give you the percentage not present. (You could do 1 - jaccard) for the similarity.

There are a few caveats to jaccard distance. It tends to be sensitive to both depth, since it places even weight on singletons and more common organisms. You may want to look into multiple rarefaction or depth correction in modeling if this is a concern. (I often add depth or alpha diversity as a term in my adonis modeling.) You may also find that the metric saturates easily, so a lot of values may hit 1 quickly.

Best,
Justine

4 Likes

Hi Justine! Thanks for the help! Of course the solution to my problem is something I already generated but failed to understand the meaning :joy: :joy:

That said, on second thought I realized the Jaccard distance is slightly different from what I want to check: the way I understand it, Jaccard is a “two-way” distance in that it take bacteria presence/absence in both samples into account. As an example, let’s say I have sample A and B. While Jaccard would tell me how many of features in A and B total are present in both, What I want to check is how many percentage of features that were in A are found in B (but not caring whether features found in B are present in A or not). Is this also possible? :joy:

Many thanks for your help!!

1 Like

Hi @Focussash,

As far as I know, there’s not a unidirectional metric. Its something you could potentially build if you’re good at R, python or some other programing language. In your case, though, I might still go with jaccard. Non-overlap from you mice could be due to contamination in the cage, but might also be explained by reagent contamination and well-to-well contamination. You could also try filtering ultra low abundance organisms, like singletons or doubletons, or things present with fewer than X counts, but filtering may complicate your data.

Best,
Justine

1 Like

Hi @jwdebelius

Thanks for your reply! I think I’d export the data and use python to analyze it haha. I’m only using mouse as an example and my PI suggested me to compare my data against gnotobiotic mouse, so I guess I will show both Jaccard and unidirectional comparison.

1 Like

HI @Focussash
how about a simple Venn diagram?
Best.
Isabel

Hi Isabel, essentially that is what I want. Hence I was asking if Qiime has a plugin which can do that, or do I have to manually process the data outside of Qiime haha. Seems I would have to manually do it.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.