Correlation analysis problem

@llenzi and @Nicholas_Bokulich

For the effects of filtering on compositionality. I agree it is an interesting question. For SCNIC I am using the SparCC correlation metric by default and so the filtering I provide is naively using the filtering parameters that were used in the original SparCC manuscript. I know that SparCC is using a Dirichlet multinomial distribution in order to estimate true relative abundances and I think this process at least tries to account for this. (I’d recommend digging into the Estimation of component fractions section of the original SparCC paper, it’s pretty interesting (https://doi.org/10.1371/journal.pcbi.1002687)). Ideally the features that are being removed should be so lowly abundant that they do not affect this estimation step.

Hello again @swillyb,

Let’s try adding one more channel. Do these two commands:

conda config --add channels conda-forge
conda config --add channels defaults

And then try the conda install again.

This worked! thank you so much!!!

I am now having an issue with the filtering (of course I am), I am able to run the filtering command... something along these lines

qiime feature-table filter-features
--i-table table.qza
--p-min-samples 2
--p-max-samples 18
--m-metadata-file sample-metadata.tsv
--o-filtered-table table-filtered2max18-sample.qza

however when I generate a bar plot from this, there is no data there. I have tried many iterations of this, with out a max, with minimum samples as 1, just to see, and no matter wgat I do, I get a bar plot with nothing ...for example...

taxa-bar-plots-table-no-Rhizobium-no-mito-filtered5-freq.qzv (316.2 KB)
taxa-bar-plots-table-filtered1nomax-feature.qzv (312.1 KB)
taxa-bar-plots-table-filtered2max18-feature.qzv (312.1 KB)

am I running this command incorrectly? Again, thank you so much for all your help!! you guys are the best!

Scott

Maybe you should feature-table summarize the post-filtered table, to see what the filtering is doing to the table.

this is great advice, thank you!

When I look at my unfiltered table.qzv, it says that features are only observed in 1 or 2 samples, so clearly if I was filtering things not in 5 everything would go away, and it did. However, when I filter everything observed in at least 1, just to check how the filtering command is working, everything also goes away, which confuses me.

I suppose Im also confused about how a feature is determined. I know from my bar plots that there are many species found in all samples, or in 10 samples, for example, are these species not features? Im very confused by this, sorry for the basic question.

And lastly, how do I convert the output of q2-SCNIC, the corr_net.qza or the membership.qza, fpr cytoscape.

Sorry for all the questions, and thank you so much for the help!

Scott

If you filter out any feature observed in at least one sample, that would remove everything! Because if a feature is observed in 0 samples, well, it would not exist in your feature table (and current default behavior for the filtering commands is to drop samples/features with 0 frequency after filtering).

Great question — see this post. Feature can mean any kind of observation in a feature table... in your case I assume feature = sequence variant, but it can also mean taxa if you are using a feature table where taxa are the observations.

No clue — you should consult the cytoscape docs for more details, but maybe @michael.shaffer knows?

You can use:

qiime tools export \
  --input-path corr_net.qza \
  --output-path corr_net

And in the corr_net directory will be a file called network.gml which can be use an an input for cytoscape.

1 Like

this is all very helpful! thank you so much!!

If the features are sequence variants, then based on my bar plots I should have many features in each sample, not just 1 or 2… is that correct?

Is there a way to specifically build a table that uses sequence variants (or genera or species, something like this)

Thanks for the help!!

The bar plots collapse feature variants based on taxonomic affiliation — so the bar plots have no relation to the total # of SVs.

A feature table of SVs (or OTUs, depending on how you processed your data) is what will be produced by default by dada2, deblur, or OTU clustering methods. You can use qiime taxa collapse to collapse these features based on taxonomic affiliation, producing a new feature table where the features are unique taxonomic groups at whatever taxonomic rank (level) you select.

Good luck!

thank you!! i was able to use the qiime taxa collapse successfully!

when I now summarize my table, I see features in numerous samples, which is perfect!

I am unsure about something, if you look at this table table-no-Rhizobium-no-mito-collapselevel4.qz.qzv (332.4 KB)

there are features in a number of samples, so I tried then to filter with this ....

qiime feature-table filter-features
--i-table table-no-Rhizobium-no-mito-collapselevel4.qza
--p-min-samples 3
--p-max-samples 18
--m-metadata-file sample-metadata.tsv
--o-filtered-table table-no-Rhizobium-no-mito-collapselevel4filtered.qza

as my understanding is that anything found in less than three samples would be removed, leaving the features observed 4 or more samples, but when I do this, I get a feature summary with nothing....

table-no-Rhizobium-no-mito-collapselevel4filtered.qza (39.4 KB)

I see this same result when I try to filter based on frequency, clearly I am doing something wrong, I just cant see what it is.

thank you so much for the help, I am so close to getting what I need from qiime2!!!

Scott

I am sorry to keep bugging you all through this forum, but I need the help, and you guys have been great!

I was able to successfully run scnic, and create a correlation network through cytoscape, but now when I try to get the correlation table in scnic, same location, just from a different table, i get errors, I can run the qiime SCNIC sparcc-filter without issue, then when I take that output and run qiime SCNIC calculate-correlations, I get this error…

Plugin error from SCNIC:

File b’/var/folders/j4/v9r3b0m50bz90vbc0qyxvm2m0000gn/T/fastsparzudgosv4/correl_table.tsv’ does not exist

Debug info has been saved to /var/folders/j4/v9r3b0m50bz90vbc0qyxvm2m0000gn/T/qiime2-q2cli-err-m3cubjg1.log

is there something I can do to fix this? thanks so much!..again

Scott

1 Like

There appears to only be a plugin error on any modified table. That is, If I use the table.qza then I have no issues, however if I exclude an species from that table, or collapse the table, the I get this error. It is important for my analysis to remove the dominant species from the data and then run correlations. Is there a way to get around this? thank you so much!!!

Scott

Hey @swillyb,

How exactly are you filtering/collapsing your table?

for exclusions of bacterial species I have used this (for example)

qiime taxa filter-table
–i-table table.qza
–i-taxonomy taxonomy.qza
–p-exclude Rhizobium
–o-filtered-table table-no-Rhizobium.qza

and to collapse I have used this

qiime taxa collapse
–i-table table-no-Rhizobium.qza \
–i-taxonomy taxonomy.qza
–p-level 4
–o-collapsed-table table-no-Rhizobium-collapselevel4.qza

I have tried to run SCNIC after just collapse or just exclude, or on a table where I have done both. I get the plugin error every time… unless Im using the table.qza if course. Thanks!

Scott

Thanks for the info @swillyb, I’m afraid I don’t know what’s going on either, but hopefully @michael.shaffer can explain (or look into it). Thanks!

Hey @swillyb

Sorry for being slow to get back. This seems like a weird one. Can you let me know what the log with debug info says? Also I would be cautious about removing any organisms before calculating correlations using sparCC. SparCC is assuming that your data has only removed rare organisms and is basing its distributions on the rest of the data. I’d be afraid that you are removing all the organisms which are abundant enough to calculate correlations on.

Mike

ahh that could be true, Im not sure how to check that, how would I know if something was too scarce for effective correlation analysis?

Here is the log file

Correlating with sparcc
Input triggered condition to perform clr correlation, this is not yet implemented
Starting FastSpar
Running SparCC iterations
Running iteration: 1
Traceback (most recent call last):
File “/Users/Echo_Base/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/q2cli/commands.py”, line 274, in call
results = action(**arguments)
File “”, line 2, in calculate_correlations
File “/Users/Echo_Base/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 231, in bound_callable
output_types, provenance)
File “/Users/Echo_Base/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 362, in callable_executor
output_views = self._callable(**view_args)
File “/Users/Echo_Base/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/q2_SCNIC/_SCNIC_methods.py”, line 24, in calculate_correlations
correls = ca.fastspar_correlation(table, verbose=True, nprocs=n_procs)
File “/Users/Echo_Base/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/SCNIC/correlation_analysis.py”, line 60, in fastspar_correlation
cor = pd.read_table(path.join(temp, ‘correl_table.tsv’), index_col=0)
File “/Users/Echo_Base/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/pandas/io/parsers.py”, line 709, in parser_f
return _read(filepath_or_buffer, kwds)
File “/Users/Echo_Base/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/pandas/io/parsers.py”, line 449, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File “/Users/Echo_Base/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/pandas/io/parsers.py”, line 818, in init
self._make_engine(self.engine)
File “/Users/Echo_Base/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/pandas/io/parsers.py”, line 1049, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File “/Users/Echo_Base/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/pandas/io/parsers.py”, line 1695, in init
self._reader = parsers.TextReader(src, **kwds)
File “pandas/_libs/parsers.pyx”, line 402, in pandas._libs.parsers.TextReader.cinit
File “pandas/_libs/parsers.pyx”, line 718, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: File b’/var/folders/j4/v9r3b0m50bz90vbc0qyxvm2m0000gn/T/fastspar8htf1zay/correl_table.tsv’ does not exist

Yes so what is happening is that there are not enough features so sparCC is failing. Can you tell me how many samples made it through your filtering? I'd also try rerunning without those abundant features filtered out as you can always filter them out after the correlation analysis.

I have included in q2-SCNIC a function called sparcc-filter which will filter your data based on the filter suggested in the sparCC manuscript. It removes all OTUs with an average read count per sample of less than two. Alternatively you can use other filters available in qiime feature-table filter-features such as --p-min-sample to get rid of features with too many zero counts across samples. For this our lab usually sets --p-min-sample to 80% of sample size when calculating correlations.