q2-SCNIC: A tool for making correlation networks, finding modules of observations and summarizing them

michael.shaffer · September 20, 2018, 12:00am

q2-SCNIC: Community Tutorial

SCNIC (Sparse Cooccurnce Network Investigation for Compositional data) is a tool for building correlation networks from feature tables, finding modules in said networks and summarizing those modules. Access to all these functionalities is available to qiime2 users via the q2-SCNIC plugin.

The SCNIC method serves three main purposes:

Making it easy for qiime 2 users to generate correlation networks using a variety of metrics.
Increasing statistical power by summarizing non-independent features into modules.
Detecting modules of features which may be of biological interest.

q2-SCNIC: GitHub - lozuponelab/q2-SCNIC: A QIIME2 plugin for running SCNIC

SCNIC: GitHub - lozuponelab/SCNIC: Sparse Cooccurence Network Investigation for Compositional data

Installing q2-SCNIC

If you are using qiime 2 2018.8 or later then you must first force an update of your blas version:

conda install -c conda-forge blas=1.1

q2-SCNIC is available via bioconda so installing is easy. Just enter into your qiime 2 conda environment and use this command:

conda install -c lozuponelab q2-SCNIC

That's it.

Note: This adds a few additional packages into your qiime 2 environment. Let us know if this affects your usage of qiime 2 by raising an issue here or by posting on the qiime 2 forum. You also can install a new qiime 2 environment and install q2-SCNIC there to avoid any conflicts with already installed plugins.

Getting data for q2-SCNIC

To run q2-SCNIC you need to start with a Feature table. You can do this tutorial with one of your own that you have imported or generate with qiime 2 or with a sample one. If you already have a feature table to start with you can skip to Running q2-SCNIC.

Downloading an example feature table

Use this command to download a sample feature table for analysis with q2-SCNIC.

wget https://github.com/shafferm/q2-SCNIC/raw/master/tests/data/fake_data.biom

Then import this table into qiime 2 using this command.

qiime tools import \
  --input-path fake_data.biom \
  --type 'FeatureTable[Frequency]' \
  --input-format BIOMV210Format \
  --output-path fake_data.qza

Now you have a filtered .qza file of your feature table to run q2-SCNIC.

Running q2-SCNIC

SCNIC can be broken up into three main steps:

Filtering your data so that it is useful for correlation analysis
Making a correlation table and network
Finding modules in the correlation network

We will run through these steps with the fake_data.qza generated above but you can run it with any feature table by changing the name of fake_data.qza to whatever your qza is called.

1. Filtering your data

Correlational analyses are hampered by having large numbers of zeroes. Therefore we are first going to remove these from our data. In the q2-SCNIC plugin a method called sparcc-filter to do this based on the parameters used in Friedman et al. This method removes all samples with a feature abundance total below 500 and all features with an average abundance less than 2 across all samples. You do not need to use these parameters and can use any method you chose to do this. Other methods for filtering feature tables are outlined here.

To use the sparcc filter use this command:

qiime SCNIC sparcc-filter \
  --i-table fake_data.qza \
  --o-table-filtered fake_data-filtered.qza

2. Calculating correlations and making your network

With your filtered data you can calculate your correlation table and make a network to visualize your correlations.

Generating a correlation table

To calculate all pairwise correlations between features in your filtered table use the following command:

qiime SCNIC calculate-correlations \
  --i-table fake_data-filtered.qza \
  --p-method sparcc \
  --o-correlation-table fake_correls.qza

Here we use the sparCC metric for measuring the strength of our correlation. This metric is recommended when you data is in the form of OTUs or ASVs (Weiss et al. 2017). You may also use Pearson, Spearman or Kendall-Tau correlation.

(Optional) Making a correlation network

From fake_correls.qza we can generate a network based on a minimum R value cutoff. A cooccurence network (AKA a network with only positive edges) will also be generated when finding modules. If you only want to make a network and not find modules or build a network with both positive and negative correlations then you can use this command:

qiime SCNIC build-correlation-network-r \
  --i-correlation-table fake_correls.qza \
  --p-min-val .35 \
  --o-correlation-network fake_net.qza

The --r-min-val parameter sets the minimum R value required to call a correlation between two features significant and therefore draw an edge between them. In this example we used a minimum value of .35. This is a common cutoff used with the sparCC correlation metric when used with 16S data.

If you want to make a correlation network based on a maximum significant p-value using the build-correlation-network-p method.

3. Detecting and summarizing modules of features

Areas of a network which are strongly interconnected are called modules. With this step we detect these modules and summarize the features in them. The summarization is a simple sum of all features in your modules across samples. This makes it so that sample abundance counts remain the same after summarization and therefore this table can be used for further statistical tests like ANCOM for testing for differential abundance.

To detect and summarize modules use this command:

qiime SCNIC make-modules-on-correlations \
  --i-correlation-table fake_correls.qza \
  --i-feature-table fake_data.qza \
  --p-min-r .35 \
  --o-collapsed-table fake_data.collapsed.qza \
  --o-correlation-network fake_net.modules.qza \
  --o-module-membership fake_membership.qza

The fake_data.collapsed.qza is a feature table you can use with any further non-phylogenetic analysis. fake_net.modules.qza is a network that is annotated with correlation information as well as module membership and can be exported from the .qza to visualize with tools such as Cytoscape.

The fake_membership.qza is viewable as metadata and can be turned into a visualization via this command:

qiime metadata tabulate \
  --m-input-file fake_membership.qza \
  --o-visualization fake_membership.qzv

This visualization can then be used to see what features are in each module.

With that you have ran SCNIC and have a feature table with fewer features giving you more power for further analyses and a correlation network investigate correlations between features in your community of interest.

eandersk · October 2, 2018, 8:27pm

Error: QIIME 2 plugin 'SCNIC' has no action 'make-modules-on-correlation-table'

make-modules-on-correlation-table is supposed to be make-modules-on-correlations

Error: No such command "meta".

qiime meta tabulate is supposed to be qiime metadata tabulate.

michael.shaffer · October 4, 2018, 8:32pm

Hey @eandersk,

Thanks for these comments. I forgot to update this documentation when updating my code. I have submitted edits for this document.

Mike

nhuda6 · October 4, 2018, 11:22pm

Hello @ michael.shaffer,

Thank you so much for developing this package. Since bacteria either co-exist or fight each other for survival, I do like the idea determining the module and then ANCOM on the module. I have two questions.

We used R package WGCNA for module analysis of microarray or metabolomics data to find out the modules. How different SCNIC from WGCNA? Is SCNIC better for the compositional data?
Do you have any plan to make a R package for SCNIC?

Regards,

Nazmul

michael.shaffer · October 5, 2018, 4:14pm

Hey @nhuda6,

Be wary of assuming that correlations necessarily imply cooperation or competition! Two microbes that are competing over resources can be positively correlated if the abundance of those resources is going up across samples. Something to keep in mind when interpreting correlation networks.

First, SCNIC is designed for compositional data while WGCNA does nothing to control for compositionality. WGCNA, and SCNIC, can be broken up into two main steps: correlation calculation and module finding.

For correlation calculation by default WGCNA uses midweight bicorrelation as a default with pearson and spearman available as options. None of these metrics take into account compositionality. SCNIC uses the sparCC correlation metric which is designed for use with 16S sequencing data and takes into account compositionality. Additionally a recent study by Sophie Weiss et al found that sparCC was the best correlation metric is most situations encountered in microbiome studies. We believe SCNIC is better than WGCNA for building correlation networks because we use metrics designed for compositional data.

For module detection WGCNA uses an algorithm that looks for hub nodes in a scale free network and modules are all strong connections to that hub. This assumes that your network is scale free and we find that microbe-microbe correlation networks fall slightly outside WGCNA's values for what the degree distribution of a scale free network should be when you use sparCC correlation as the correlation metric and fall entirely out of scope of what a scale free network should be when you use the built in WGCNA correlation metrics.

SCNIC uses a custom algorithm for finding modules. This is done by converting the sparCC correlation value into a distance metric where a R value of 1 is a distance of zero and a R value of -1 is a distance of 1. Then a single linkage hierarchical clustering is done to build a tree from these correlations and modules are detected as groups of microbes in the same clade were all pairs have a correlation value above the the user provided R value (we recommend .3 or .35 as this is about where the multiple testing corrected p-value for the sparCC correlation is less than .1 or .05 respectively). This guarantees that all microbes in each module are positively correlated with each other with a correlation at least as strong as the user provided R value. We believe this is better than WGCNA's clustering algorithm because we guarantee that all pairwise correlations within a module will be above a certain strength and do not have the assumption of a scale free network structure which we do not believe will always be present in microbiome data.

We are working on the paper for SCNIC now which will show that the modules detected by SCNIC are enriched in both phylogeny and predicted gene content and include comparison to other modules detection methods including WGCNA. Generally we find other methods tend to make modules that are too big and you end up with situations were pairs of microbes in the same module will be negative correlated or associated with outside metadata of interest (such as disease state) in an opposite manner.

There are currently no plans to port SCNIC to R but I would be happy to help anyone write an R wrapper for SCNIC if they were interested. Also SCNIC can be used here in qiime2 or on the command line to get sparCC correlations and make modules and these can then be loaded into R for further analysis.

I know this is a lot but I hope it was helpful and please let me know if you have any more questions.

Mike

nhuda6 · October 5, 2018, 5:10pm

Hello @michael.shaffer,

Thank you so much for your detailed answer with references. Those are really helpful.

I did some analysis with SCNIC and found modules. Now I need to perform the downstream statistical analysis on the modules. I have a quick question. Some of the bacteria did not go to any of the module. Wondering if you suggest to exclude or include them in the downstream analysis. I assume they are very minor in the microbial community.

Thank you so much for developing this plugin. I am looking forward to reading your SCNIC paper.

Regards,

Nazmul

michael.shaffer · October 5, 2018, 5:24pm

The output qza you get of the collapsed table includes all bacteria inside and outside of modules. All microbes not included in a module are still there with their original names and abundances while microbes in modules have been removed and replaced with their modules whose abundance is the summation of all abundances of microbes within the modules per sample.

I would suggest using the collapsed.qza for further analyses which do not include any phylogenetic component.

nhuda6 · October 5, 2018, 5:49pm

@ michael.shaffer,

Thank you so much.

Regards,
Nazmul

vrbana · October 9, 2018, 7:15pm

Hi @michael.shaffer,

Is SCNIC appropriate for correlating microbiome and metabolome data? I essentially have two feature tables of abundances, both fairly sparse. I noticed on the SCNIC github page you have an option for "between" pairwise correlations using two feature tables. Is this available in the qiime2 plugin?

michael.shaffer · October 11, 2018, 6:06pm

Hello @vrbana

You can use SCNIC to do these between data type correlations. To do this I am just allowing the user to use Spearman, Pearson or Kendall-Wright correlation and making a network and correlations file so nothing too fancy. I have not implemented this functionality in q2-SCNIC at this point and have plans to in the future but it won't be in the next month. If you want to use it in SCNIC then you can export the biom tables from your .qza's and then use those with SCNIC_analysis.py between.

Mike

vrbana · October 11, 2018, 10:45pm

Sounds great, thanks! I'll give it a try outside of q2

llenzi · November 12, 2018, 9:59am

Dear Michael,

I'm trying to install the q2-SCNIC plug in with qiime2-2018.11, but I got the following error:

Solving environment: failed

UnsatisfiableError: The following specifications were found to be in conflict:

q2-phylogeny=2018.11.0
q2-quality-control=2018.11.0
q2-scnic
r-snow=0.4_3
Use "conda info " to see the dependencies for each package.

What I'm doing is, within the activate env, the following command:

conda config --add channels bioconda
conda config --add channels conda-forge
conda install -c conda-forge blas=1.1
conda install -c lozuponelab q2-SCNIC

Is there anything I can do to fix it?
I still have it working in a qiime2-2018.8 env, so not a big deal at the moment but I let you know about this just in case.
Best,
Luca

thermokarst · November 13, 2018, 2:46pm

Hey there @llenzi! I will let @michael.shaffer reply to this, but in the meantime, I took a look at the q2-SCNIC recipe (I am not a developer of this plugin), and it looks like the recipe indicates that it is only compatible with 2018.8 (at least as far as conda is concerned). There is probably no real reason for that, maybe @michael.shaffer can update the recipe to indicate 2018.8 as a minimum version. Anyway, until you hear back from @michael.shaffer, I would suggest just continuing with your 2018.8 env, no worries about having more than one QIIME 2 env on a machine.

lca123 · November 13, 2018, 3:48pm

Wow, thanks for bringing us such a nice tool. I will certainlly give it a try as soon as I can!

michael.shaffer · November 13, 2018, 6:53pm

Hello @llenzi,

@thermokarst is right. I will be pushing out a big update to q2-SCNIC that will come out hopefully in the next week or two that will include an update to the qiime 2 base version. For now I'd recommend continuing to use q2-SCNIC in your 2018.8 environment.

Mike

thermokarst · November 17, 2018, 12:25am

An off-topic reply has been split into a new topic: Can't run q2-SCNIC

Please keep replies on-topic in the future.

michael.shaffer · December 11, 2018, 8:39pm

q2-SCNIC has now been updated for 2018.11. An additional feature update will be released soon.

thermokarst · January 22, 2019, 4:21pm

An off-topic reply has been split into a new topic: Q2-scnic packagenotfounderror

Please keep replies on-topic in the future.

thermokarst · January 30, 2019, 5:02pm

An off-topic reply has been split into a new topic: Issues with q2-scnic

Please keep replies on-topic in the future.

Nicholas_Bokulich · May 6, 2019, 12:30pm

An off-topic reply has been split into a new topic: Plugin error from SCNIC: No p or p_adj in correls

Please keep replies on-topic in the future.