Correlation analysis problem

swillyb · October 8, 2018, 9:44pm

thank you!! i was able to use the qiime taxa collapse successfully!

when I now summarize my table, I see features in numerous samples, which is perfect!

I am unsure about something, if you look at this table table-no-Rhizobium-no-mito-collapselevel4.qz.qzv (332.4 KB)

there are features in a number of samples, so I tried then to filter with this ....

qiime feature-table filter-features
--i-table table-no-Rhizobium-no-mito-collapselevel4.qza
--p-min-samples 3
--p-max-samples 18
--m-metadata-file sample-metadata.tsv
--o-filtered-table table-no-Rhizobium-no-mito-collapselevel4filtered.qza

as my understanding is that anything found in less than three samples would be removed, leaving the features observed 4 or more samples, but when I do this, I get a feature summary with nothing....

table-no-Rhizobium-no-mito-collapselevel4filtered.qza (39.4 KB)

I see this same result when I try to filter based on frequency, clearly I am doing something wrong, I just cant see what it is.

thank you so much for the help, I am so close to getting what I need from qiime2!!!

Scott

swillyb · October 9, 2018, 5:20pm

I am sorry to keep bugging you all through this forum, but I need the help, and you guys have been great!

I was able to successfully run scnic, and create a correlation network through cytoscape, but now when I try to get the correlation table in scnic, same location, just from a different table, i get errors, I can run the qiime SCNIC sparcc-filter without issue, then when I take that output and run qiime SCNIC calculate-correlations, I get this error…

Plugin error from SCNIC:

File b’/var/folders/j4/v9r3b0m50bz90vbc0qyxvm2m0000gn/T/fastsparzudgosv4/correl_table.tsv’ does not exist

Debug info has been saved to /var/folders/j4/v9r3b0m50bz90vbc0qyxvm2m0000gn/T/qiime2-q2cli-err-m3cubjg1.log

is there something I can do to fix this? thanks so much!..again

Scott

swillyb · October 10, 2018, 6:32pm

There appears to only be a plugin error on any modified table. That is, If I use the table.qza then I have no issues, however if I exclude an species from that table, or collapse the table, the I get this error. It is important for my analysis to remove the dominant species from the data and then run correlations. Is there a way to get around this? thank you so much!!!

Scott

ebolyen · October 10, 2018, 6:39pm

Hey @swillyb,

How exactly are you filtering/collapsing your table?

swillyb · October 10, 2018, 6:48pm

for exclusions of bacterial species I have used this (for example)

qiime taxa filter-table
–i-table table.qza
–i-taxonomy taxonomy.qza
–p-exclude Rhizobium
–o-filtered-table table-no-Rhizobium.qza

and to collapse I have used this

qiime taxa collapse
–i-table table-no-Rhizobium.qza \
–i-taxonomy taxonomy.qza
–p-level 4
–o-collapsed-table table-no-Rhizobium-collapselevel4.qza

I have tried to run SCNIC after just collapse or just exclude, or on a table where I have done both. I get the plugin error every time… unless Im using the table.qza if course. Thanks!

Scott

ebolyen · October 11, 2018, 5:24pm

Thanks for the info @swillyb, I’m afraid I don’t know what’s going on either, but hopefully @michael.shaffer can explain (or look into it). Thanks!

michael.shaffer · October 11, 2018, 6:02pm

Hey @swillyb

Sorry for being slow to get back. This seems like a weird one. Can you let me know what the log with debug info says? Also I would be cautious about removing any organisms before calculating correlations using sparCC. SparCC is assuming that your data has only removed rare organisms and is basing its distributions on the rest of the data. I’d be afraid that you are removing all the organisms which are abundant enough to calculate correlations on.

Mike

swillyb · October 11, 2018, 9:09pm

ahh that could be true, Im not sure how to check that, how would I know if something was too scarce for effective correlation analysis?

Here is the log file

Correlating with sparcc
Input triggered condition to perform clr correlation, this is not yet implemented
Starting FastSpar
Running SparCC iterations
Running iteration: 1
Traceback (most recent call last):
File “/Users/Echo_Base/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/q2cli/commands.py”, line 274, in call
results = action(**arguments)
File “”, line 2, in calculate_correlations
File “/Users/Echo_Base/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 231, in bound_callable
output_types, provenance)
File “/Users/Echo_Base/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 362, in callable_executor
output_views = self._callable(**view_args)
File “/Users/Echo_Base/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/q2_SCNIC/_SCNIC_methods.py”, line 24, in calculate_correlations
correls = ca.fastspar_correlation(table, verbose=True, nprocs=n_procs)
File “/Users/Echo_Base/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/SCNIC/correlation_analysis.py”, line 60, in fastspar_correlation
cor = pd.read_table(path.join(temp, ‘correl_table.tsv’), index_col=0)
File “/Users/Echo_Base/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/pandas/io/parsers.py”, line 709, in parser_f
return _read(filepath_or_buffer, kwds)
File “/Users/Echo_Base/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/pandas/io/parsers.py”, line 449, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File “/Users/Echo_Base/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/pandas/io/parsers.py”, line 818, in init
self._make_engine(self.engine)
File “/Users/Echo_Base/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/pandas/io/parsers.py”, line 1049, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File “/Users/Echo_Base/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/pandas/io/parsers.py”, line 1695, in init
self._reader = parsers.TextReader(src, **kwds)
File “pandas/_libs/parsers.pyx”, line 402, in pandas._libs.parsers.TextReader.cinit
File “pandas/_libs/parsers.pyx”, line 718, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: File b’/var/folders/j4/v9r3b0m50bz90vbc0qyxvm2m0000gn/T/fastspar8htf1zay/correl_table.tsv’ does not exist

michael.shaffer · October 13, 2018, 3:22pm

Yes so what is happening is that there are not enough features so sparCC is failing. Can you tell me how many samples made it through your filtering? I'd also try rerunning without those abundant features filtered out as you can always filter them out after the correlation analysis.

I have included in q2-SCNIC a function called sparcc-filter which will filter your data based on the filter suggested in the sparCC manuscript. It removes all OTUs with an average read count per sample of less than two. Alternatively you can use other filters available in qiime feature-table filter-features such as --p-min-sample to get rid of features with too many zero counts across samples. For this our lab usually sets --p-min-sample to 80% of sample size when calculating correlations.

swillyb · October 17, 2018, 3:39pm

ok great! thank you! I will try all of these things! I appreciate the response, sorry for my late reply I have been traveling for a conference

Scott

swillyb · October 17, 2018, 4:34pm

I have another question (sorry)

I was reading in the forums that to create a table file from qiime2 with the taxonomy, I will need to use qiime taxa collapse, get the collapsed table that contains taxonomy. However this then causes issues in SCNIC.

So my question is, how can I create a table.qza with the taxonomy so that I can run SCNIC and the network will have taxonomic identifications instead of the variant unique ID? would this even be possible?

thanks for all the help!

Scott

swillyb · October 17, 2018, 4:56pm

Also,

Im a bit confused about why sparcc is failing. I have a data set that includes a microbiome with mostly rhizobia, which makes sense. I also have a lot of other bacteria present in the samples. What I want to do is remove the rhizobia and run the correlations, as this is what we are interested in.

I have included the bar plot, just so you can see what Im seeing here, and this is what I would liek to run the correlations on, with the taxonomy ... does that make sense??

taxa-bar-plots-norhiz.qzv (338.1 KB)

Sorry, I know Im being a pain, Im just really wanting to get this done!

Thanks so much!!

michael.shaffer · October 17, 2018, 5:30pm

Hey @swillyb,

I don't quite understand what you are trying to do. If you want to collapse by taxonomy and then use SCNIC you can definitely do that. Use qiime taxa collapse and then use the output of that as the input to SCNIC. If you want to have taxonomy as a column in your node data when you view a network generated by SCNIC in cytoscape then you can use your taxonomy.qza that you generate in any way you chose and qiime tools export that to get a tsv. This can then be used as node data in Cytoscape where you can load you network and then use the import table from file function to add the taxonomy column to the node metadata.

SparCC is not a traditional correlation metric where you give it two lists of numbers and it tells you how strongly correlated the two lists are. SparCC is trying to calculate correlations while taking into account the compositionality of the data (AKA the fact that the abundances we observe are relative to the total count of observations in a sample). This means that to calculate the correlations between your ASVs sparCC needs to know what the total number of reads you have per sample. By getting rid of high abundance observations you are making it so that sparCC isn't getting a realistic idea of what the relative abundances of your ASVs so it can't calculate the correlation. If you want to get rid of your Rhizobia then you can do that after you run SCNIC and interpret as you wish.

I'd also recommend look into your data a bit more. It is a bit concerning that some of your samples are 100% mitochondria and more are 50%+ mitochondria. You might want to filter out those ASVs before running SCNIC since they are not bacteria and therefore not part of the composition of bacteria in your samples.

Hope this helps,
Mike

swillyb · October 17, 2018, 5:49pm

I have seen that with the Mitochondria and have since removed that from the data set, I noticed that this morning.

Previously we have talked about when I do the qiime taxa collapse, and then run scnic, I get the plug in error, as there arent enough features so sparcc is failing.

I will try this all again, and see how it goes, thanks so much for the reply, and all the help! I will let you know how it goes!!

Scott

swillyb · October 19, 2018, 2:31pm

This all worked great! thank you so much! I was confused about some stuff, but I got it now!

I am curious about the 0.35, I read that this r value is converted from the SparCC correlation value, and that 0.35 is about a p value of 0.05. Is there a way to calculate this specifically? That is, if I want 0.01 I could do a calculation and put in 0.55 or something, or if I needed exactly 0.05 as opposed to about 0.05 , could I calculate this somehow?

Thank you so much for everything here! this forum is the best!

Scott

ebolyen · October 24, 2018, 4:57pm

Hey @swillyb,

Just wanted to let you know that @michael.shaffer is currently pretty tied up writing his thesis.

Where did you read this? Typically r-values measure strength of an effect and has nothing really to do with how likely seeing such a strength is (p-value, roughly). So mapping r-values to p-values doesn't quite make sense to me, but perhaps the SparCC correlation is special in this regard?

michael.shaffer · October 24, 2018, 5:14pm

This is based on my experiences using sparCC across many data sets as well as experiences of other people in my lab. It's also the correlation cut off that was recommended in the original sparCC paper. You can definitely test this yourself. SCNIC uses the fastspar package which is available through bioconda and in the future q2-SCNIC will be updated to generate p-values for sparCC.

So sparCC uses a permutation test to determine p-values. So data labels are shuffled and then correlations are calculated to create a null distribution of R values and then the p-value is calculated by comparing to this distribution. So by finding that .35 pretty much always works is us finding that this null distribution is always very similar across data sets. I work only with gut data though, so this could change in other microbiomes.

system · November 24, 2018, 11:14pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.