For the effects of filtering on compositionality. I agree it is an interesting question. For SCNIC I am using the SparCC correlation metric by default and so the filtering I provide is naively using the filtering parameters that were used in the original SparCC manuscript. I know that SparCC is using a Dirichlet multinomial distribution in order to estimate true relative abundances and I think this process at least tries to account for this. (I’d recommend digging into the Estimation of component fractions section of the original SparCC paper, it’s pretty interesting (https://doi.org/10.1371/journal.pcbi.1002687)). Ideally the features that are being removed should be so lowly abundant that they do not affect this estimation step.
however when I generate a bar plot from this, there is no data there. I have tried many iterations of this, with out a max, with minimum samples as 1, just to see, and no matter wgat I do, I get a bar plot with nothing ...for example...
When I look at my unfiltered table.qzv, it says that features are only observed in 1 or 2 samples, so clearly if I was filtering things not in 5 everything would go away, and it did. However, when I filter everything observed in at least 1, just to check how the filtering command is working, everything also goes away, which confuses me.
I suppose Im also confused about how a feature is determined. I know from my bar plots that there are many species found in all samples, or in 10 samples, for example, are these species not features? Im very confused by this, sorry for the basic question.
And lastly, how do I convert the output of q2-SCNIC, the corr_net.qza or the membership.qza, fpr cytoscape.
Sorry for all the questions, and thank you so much for the help!
If you filter out any feature observed in at least one sample, that would remove everything! Because if a feature is observed in 0 samples, well, it would not exist in your feature table (and current default behavior for the filtering commands is to drop samples/features with 0 frequency after filtering).
Great question — see this post. Feature can mean any kind of observation in a feature table... in your case I assume feature = sequence variant, but it can also mean taxa if you are using a feature table where taxa are the observations.
No clue — you should consult the cytoscape docs for more details, but maybe @michael.shaffer knows?
The bar plots collapse feature variants based on taxonomic affiliation — so the bar plots have no relation to the total # of SVs.
A feature table of SVs (or OTUs, depending on how you processed your data) is what will be produced by default by dada2, deblur, or OTU clustering methods. You can use qiime taxa collapse to collapse these features based on taxonomic affiliation, producing a new feature table where the features are unique taxonomic groups at whatever taxonomic rank (level) you select.
as my understanding is that anything found in less than three samples would be removed, leaving the features observed 4 or more samples, but when I do this, I get a feature summary with nothing....
I am sorry to keep bugging you all through this forum, but I need the help, and you guys have been great!
I was able to successfully run scnic, and create a correlation network through cytoscape, but now when I try to get the correlation table in scnic, same location, just from a different table, i get errors, I can run the qiime SCNIC sparcc-filter without issue, then when I take that output and run qiime SCNIC calculate-correlations, I get this error…
Plugin error from SCNIC:
File b’/var/folders/j4/v9r3b0m50bz90vbc0qyxvm2m0000gn/T/fastsparzudgosv4/correl_table.tsv’ does not exist
Debug info has been saved to /var/folders/j4/v9r3b0m50bz90vbc0qyxvm2m0000gn/T/qiime2-q2cli-err-m3cubjg1.log
is there something I can do to fix this? thanks so much!..again
There appears to only be a plugin error on any modified table. That is, If I use the table.qza then I have no issues, however if I exclude an species from that table, or collapse the table, the I get this error. It is important for my analysis to remove the dominant species from the data and then run correlations. Is there a way to get around this? thank you so much!!!
I have tried to run SCNIC after just collapse or just exclude, or on a table where I have done both. I get the plugin error every time… unless Im using the table.qza if course. Thanks!
Sorry for being slow to get back. This seems like a weird one. Can you let me know what the log with debug info says? Also I would be cautious about removing any organisms before calculating correlations using sparCC. SparCC is assuming that your data has only removed rare organisms and is basing its distributions on the rest of the data. I’d be afraid that you are removing all the organisms which are abundant enough to calculate correlations on.
ahh that could be true, Im not sure how to check that, how would I know if something was too scarce for effective correlation analysis?
Here is the log file
Correlating with sparcc
Input triggered condition to perform clr correlation, this is not yet implemented
Starting FastSpar
Running SparCC iterations
Running iteration: 1
Traceback (most recent call last):
File “/Users/Echo_Base/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/q2cli/commands.py”, line 274, in call
results = action(**arguments)
File “”, line 2, in calculate_correlations
File “/Users/Echo_Base/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 231, in bound_callable
output_types, provenance)
File “/Users/Echo_Base/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 362, in callable_executor
output_views = self._callable(**view_args)
File “/Users/Echo_Base/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/q2_SCNIC/_SCNIC_methods.py”, line 24, in calculate_correlations
correls = ca.fastspar_correlation(table, verbose=True, nprocs=n_procs)
File “/Users/Echo_Base/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/SCNIC/correlation_analysis.py”, line 60, in fastspar_correlation
cor = pd.read_table(path.join(temp, ‘correl_table.tsv’), index_col=0)
File “/Users/Echo_Base/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/pandas/io/parsers.py”, line 709, in parser_f
return _read(filepath_or_buffer, kwds)
File “/Users/Echo_Base/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/pandas/io/parsers.py”, line 449, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File “/Users/Echo_Base/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/pandas/io/parsers.py”, line 818, in init
self._make_engine(self.engine)
File “/Users/Echo_Base/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/pandas/io/parsers.py”, line 1049, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File “/Users/Echo_Base/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/pandas/io/parsers.py”, line 1695, in init
self._reader = parsers.TextReader(src, **kwds)
File “pandas/_libs/parsers.pyx”, line 402, in pandas._libs.parsers.TextReader.cinit
File “pandas/_libs/parsers.pyx”, line 718, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: File b’/var/folders/j4/v9r3b0m50bz90vbc0qyxvm2m0000gn/T/fastspar8htf1zay/correl_table.tsv’ does not exist
Yes so what is happening is that there are not enough features so sparCC is failing. Can you tell me how many samples made it through your filtering? I'd also try rerunning without those abundant features filtered out as you can always filter them out after the correlation analysis.
I have included in q2-SCNIC a function called sparcc-filter which will filter your data based on the filter suggested in the sparCC manuscript. It removes all OTUs with an average read count per sample of less than two. Alternatively you can use other filters available in qiime feature-table filter-features such as --p-min-sample to get rid of features with too many zero counts across samples. For this our lab usually sets --p-min-sample to 80% of sample size when calculating correlations.