Hello - apologies in advance for the length of this question, I think things are all related so I thought it was best if I posted this together. I'm working with an experimental system that has low biomass, ASVs that are predominantly unclassified at the order level or higher (many unclassified at the Phylum level), and a severely left-tail feature distribution (most ASVs are only observed a few times).
My experimental design fits a stratified factorial within subjects model: I have 5 different deployments (at this time I am not doing a longitudinal analyses and treating these instead as 5 separate experiments), testing to see if flow rate and choice of physical substrate influence community structure. In each of the 5 experiments I have either fast or slow flow conditions and in each of those conditions I have bead or crush substrate. In each condition there are 1 - 3 replicates (the design is unbalanced in that there are different numbers of replicates in each group do to constraints of the system). I also have an initial test system: again due to constraints of the system, the sample device in the slow system is made from plastic and the sample device in the fast system is made from glass. In three of the deployments I also set up plastic and glass devices in the fast system to evaluate if the material of the sampling device influenced the community.
1:glass vs plastic - difference between PERMANOVA and univariate Adonis
edit - solved - found an errant space in the PERMANOVA command
For each deployment I did a pair-wise test between the glass and plastic containers. I thought using qiime diversity beta-group-significance
would be the most straightforward. I ran the code below, but after 16 hours I finally terminated it. I then tried using qiime diversity adonis
and it worked fine. I can't figure out why qiime diversity beta-group-significance
didn't work as I thought they were different implementations of the same test:
PERMANOVA
qiime diversity beta-group-significance \
--i-distance-matrix braycurtis_distance.qza \
--m-metadata-file metadata_103020.tsv \
--m-metadata-column container \
--o-visualization braycurtis_container_pernova.qzv
ADONIS
qiime diversity adonis \
--i-distance-matrix braycurtis_distance.qza \
--m-metadata-file metadata_103020.tsv \
--o-visualization manifold_crush_braycurtis_container_adonis.qzv \
--p-formula container
2 ADONIS formula
The univariate adonis (I think that's what it would be?) indicated that there was not a significant difference between the Bray-Curtis dissimilarities of community compositions between container material. Great. Moving on I wanted to use Adonis to assess the effect each level of both factors (container=fast or slow; fill=bead or crush) including interaction had and if there was a significant difference in any of the six groups. I had a couple of questions about how qiime diversity adonis
calculates the total number of samples in each group and if running additional stratified test was valid.
2a: number of samples calculation
Looking at the original Anderson 2001 paper in the section 'Calculating the statistic' the total number of observations are calculated by:
"Let A designate factor 1 with a levels (treatments or groups) and B designate factor 2 with b levels, with n replicates in each of the ab combinations of the two factors. The total number of observations is N = abn."
If my experimental design is unbalanced such that the number of replicate in each of the groups is different wiht N = (an)+(bn'), is Adonis still valid and does the implementation in qiime diversity adonis
allow for this in calculating the total number of observations?
2b) Using a factorial design I could establish that there was no significant interaction effect between factors. I would now like to look at difference between levels of a factor with the factor themselves stratified: ie: is there a significant effect of fill (bead vs crush) in each of the two containers (fast and slow) and vice versa. Is it valid to break down a factorial design like this once it's established that interaction between factors is not significant?
3 permdisp
If Adonis is valid, then it looks like there there was a significant difference between fill and container and there was not a significant interaction in each deployment. I know that Adonis can find a significant difference if there is a significant difference in dispersion in addition to a difference in the means. So permdisp
can be used to see if there is a significant difference in dispersion between groups of factors. I ran the follow that produced the following error:
qiime diversity beta-group-significance \
--i-distance-matrix braycurtis_distance.qza \
--m-metadata-file metadata_103020.tsv \
--m-metadata-column fill \
--o-visualization braycurtis_fill_permdisp.qzv \
--p-method permdisp
Plugin error from diversity: too many indices for array: array is 1-dimensional, but 2 were indexed Debug info has been saved to /export/data1/tmp/qiime2-q2cli-err-wq7d8kxv.log
Traceback (most recent call last): File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/q2cli/commands.py", line 328, in __call__ results = action(**arguments) File "<decorator-gen-408>", line 2, in beta_group_significance File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/qiime2/sdk/action.py", line 245, in bound_callable output_types, provenance) File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/qiime2/sdk/action.py", line 452, in _callable_executor_ ret_val = self._callable(output_dir=temp_dir, **view_args) File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/q2_diversity/_beta/_visualizer.py", line 153, in beta_group_significa$ permutations=permutations) File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/skbio/stats/distance/_permdisp.py", line 233, in permdisp permutations) File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/skbio/stats/distance/_base.py", line 1085, in _run_monte_carlo_stats stat = test_stat_function(grouping) File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/skbio/stats/distance/_permdisp.py", line 247, in _compute_groups centroids = samples.groupby('grouping').aggregate(_config_med) File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/groupby/generic.py", line 971, in aggregate result = self._aggregate_multiple_funcs([func], _axis=self.axis) File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/base.py", line 526, in _aggregate_multiple_funcs new_res = colg.aggregate(arg) File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/groupby/generic.py", line 246, in aggregate ret = self._aggregate_multiple_funcs(func) File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/groupby/generic.py", line 319, in _aggregate_multiple_fun$ results[base.OutputKey(label=name, position=idx)] = obj.aggregate(func) File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/groupby/generic.py", line 261, in aggregate func, *args, engine=engine, engine_kwargs=engine_kwargs, **kwargs File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 1083, in _python_agg_general result, counts = self.grouper.agg_series(obj, f) File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/groupby/ops.py", line 644, in agg_series return self._aggregate_series_fast(obj, func) File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/groupby/ops.py", line 669, in _aggregate_series_fast result, counts = grouper.get_result() File "pandas/_libs/reduction.pyx", line 256, in pandas._libs.reduction.SeriesGrouper.get_result File "pandas/_libs/reduction.pyx", line 74, in pandas._libs.reduction._BaseGrouper._apply_to_group File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 1060, in <lambda> f = lambda x: func(x, *args, **kwargs) File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/skbio/stats/distance/_permdisp.py", line 264, in _config_med X = x.values[:, :-1] IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed
I didn't find much in the forum for this error.
4: ANCOM
I moved forward with ANCOM to identify features that were differentially expressed between the levels of the factors that Adonis indicated had significantly different Bray-Curtis dissimilarities. I've run into a couple problems here and I think it's mostly due to the frequency distribution of my features. I'm not expecting many features to change between conditions (unless dada2 was off and my singleton and low abundance ASVs are noise, but I thought Data2 and the error models were such that low abundance ASVs could be treated as less error prone than OTUs).
When I first ran ANCOM without filtering low abundance ASVs (since I think these are real and driving the differences Adnois found) I ended up with results I was skeptical of - long lists of hundreds of ASVs that I could reject H_0 for. Since Adonis is operating on the dissimilarity matrix and ANCOM is operating on the featureTable[frequency] I know they may have different resolutions. ie ANCOM performs a CLR that requires a pseudocount to be added, and since my low abundance ASVs were likely observed at a similar magnitude as the added pseudocount, I decided filter. Filtering to include only ASVs that accounted for 0.02% of the observed features (173) and increasing that to only ASVs observed at least 10 times both failed to identify any significantly different ASVs (consistent with my hypothesis that it's the low abundance ASVs driving the difference between conditions). When I increased the filter to at least 2 observations I ended up with some results that looked as I would expect (1 - 5 ASVs identified) and other results that look like the below (where anything with W=0 or higher was identified, and a cut off of 0 seems suspect. So I had three questions I wasn't sure how to answer:
4a) W=0 seems like a suspect cut-off and that my low abundance ASVs may be difficult to resolve from a pseudocount. In the volcano plot - I thought that CLR vs W was plotted for all features in the comparison, but in at least one of my plots there are clearly only a few of features plotted.
4b) If I filter the feature table for ANCOM, should I also filter the feature table to the same level before calculating the Bray-Curtis dissimilarity used for Adonis? Or just state that ANCOM, because of the need to add a pseudocount, will have a lower resolution and may not identify all ASVs that led to the significant difference in BC dissimilarity via Adnois?
4c) Can I adjust the value of the pseudocount being added in ANCOM, or is there a better method for low abundance ASV discrimination that I should be using?
Thank you for all of the forum support!