Finding most significant correlations in OTU datasets across multiple groups?

emmzee · May 18, 2019, 11:43am

Hi all,

Is there any tool to help me find correlation between various types of samples? I have a large OTU dataset (~9k OTUs) and mainly two groups of study. I would like to test for any correlations between those groups at the OTU level and determine which one are the most significant correlations. The question I'm attempting to answer is: Will an increase in a particular OTU in one group will lead to the increase in the same OTU in the other group?

jwdebelius · May 20, 2019, 8:03am

Hi @emmzee,

A little more information about your design might be helpful. Are you trying to determine differential/non-differential OTUs (i.e. the opposite of significance testing), look at an overlap, or do some kind of replication? Because the best approach for each varies.

Best,
Justine

emmzee · May 24, 2019, 7:07am

Thanks for your response, @jwdebelius. I'm attempting to answer questions regarding microbial vertical transmission during birth.

I believe a good approach is studying core microbiota to determine if any OTUs are shared. Using differential OTU abundance, I found significant differences in many OTUs, but that's because I have pairs of samples consisting of parent and offspring with different microbiota. Is there an easy way to look for overlap of OTUs among pairs from those two groups (parents, offspring) and if possible, determine if any OTU from one group has an effect on the abundance of an OTU from the other group? If you could share a user friendly application, I would greatly appreciate it.

jwdebelius · May 24, 2019, 7:28am

Hi @emmzee,

In this case, where you have parent/child pairs, you should look into paired analysis. You'll likely get the best milage out of the q2-longitudinal plugin. There's a nice tutorial that deals with this type of analysis.

You could maybe look at co-occurance networks for shared pairs, but Im totally sure how you'd work through there. I also recently saw a paper on social networks (saved on my other computer of course!) but perhaps you could look at their methods as well? I haven't read the paper, I dont know the full scope of their implementation, but again, maybe worth looking into.

Best,
Justine

emmzee · May 24, 2019, 10:34am

I will definitely check the plugin out, and will give the social networks a look as well. I'm new to QIIME2, and have constructed OTU clusters and tables in QIIME1. Will my data still be incorporable into this plugin?

Thank you again, @jwdebelius!

jwdebelius · May 24, 2019, 10:36am

Hi @emmzee,

I'd recommending going through the tutorial for moving from QIIME 1 to QIIME 2, then. You can find it on the tutorials page. As long as you import your table, you should be fine.

Best,
Justine

Nicholas_Bokulich · May 24, 2019, 12:35pm

Use jaccard distance; this is the fraction of OTUs that are not shared between each pair of samples.

The q2-longitudinal tutorial uses birth cohort data, so explores similar questions to what you may be interested in. You can even look at shared OTUs between mother/infant pairs across time by using first-distances on a Jaccard distance matrix (generated by qiime diversity core-metrics)... there is an example of how to do this in the q2-longitudinal publication (see the jupyter notebook that the publication links to for a code example)

this sounds fairly complicated — if the OTU is found in the parent and not the child, then traditional co-occurrence networks will not help (but are still probably a place to start for understanding parent-child connections).

system · June 24, 2019, 6:35pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.