How to analyse data using ANOVA in qiime2?

Bhagwan · August 7, 2019, 9:04am

I have to perform ANOVA test on my samples. I would like to know to analyse in qiime2. What are the commands and and how one can choose value for --p-formula.
Please see following Docstring in qiim2 tutorial.
Usage: qiime longitudinal anova [OPTIONS]

Perform an ANOVA test on any factors present in a metadata file and/or
metadata-transformable artifacts. This is followed by pairwise t-tests to
examine pairwise differences between categorical sample groups.

Parameters:
--m-metadata-file METADATA...
(multiple Sample metadata containing formula terms.
arguments will
be merged) [required]
--p-formula TEXT R-style formula specifying the model. All terms must
be present in the sample metadata or
metadata-transformable artifacts and can be continuous
or categorical metadata columns. Formulae will be in
the format "a ~ b + c", where "a" is the metric
(dependent variable) and "b" and "c" are independent
covariates. Use "+" to add a variable; "+ a:b" to add
an interaction between variables a and b; "*" to
include a variable and all interactions; and "-" to
subtract a particular term (e.g., an interaction term).
See
How formulas work — patsy 0.5.1+dev documentation
for full documentation of valid formula operators.
Always enclose formulae in quotes to avoid unpleasant
surprises. [required]
--p-sstype TEXT Choices('I', 'II', 'III')
Type of sum of squares calculation to perform (I, II,
or III). [default: 'II']
Outputs:
--o-visualization VISUALIZATION
[required]

How to construct above formula using R-style formula specifying the model. .
Please help.

colinbrislawn · August 13, 2019, 2:06pm

Good morning @Bhagwan,

The formula let's you compare categories in your metadata file. So if you medata looked like this:

sample-id	barcode-sequence	body-site	year	month	day	subject	reported-antibiotic-usage	days-since-experiment-start
#q2:types	categorical	categorical	numeric	numeric	numeric	categorical	categorical	numeric
L1S8	AGCTGACTAGTC	gut	2008	10	28	subject-1	Yes	0
L1S57	ACACACTATGGC	gut	2009	1	20	subject-1	No	84
L1S76	ACTACGTGTGGT	gut	2009	2	17	subject-1	No	112
L1S105	AGTGCGATGCGT	gut	2009	3	17	subject-1	No	140
L2S155	ACGATGCGACCA	left palm	2009	1	20	subject-1	No	84
L2S175	AGCTATCCACGA	left palm	2009	2	17	subject-1	No	112
L2S204	ATGCAGCTCAGT	left palm	2009	3	17	subject-1	No	140
L2S222	CACGTGACATGT	left palm	2009	4	14	subject-1	No	168

Your formula might look like this --p-formula subject+body-site
This will use the ANOVA test to compare the effect of subject number and body site locations.

What columns do you have in your metadata?
Colin

Bhagwan · August 13, 2019, 5:03pm

Dear Collin, Please have a look at metadata file and command.

SampleID	BarcodeSequence	LinkerprimerSequence	Area	Place	SampleType	Date	DepthInCM	TemperatureAtTheTimeofSampling	pH	SampleNumber	SampleNumber-1	DistancefromSeashoreInKMS	DistancefromStartingPointInKMS	SamplingPointParallelTo	SerialNumber	SerialNumber-1	Description
AEMK01	AAAAAAAAT	YATGCTGCCTCCCGTAGGAGT	AndhraPradeshCoast	Ethamukkala	Soil	22 Oct. 2017	10	30	7	AEMK01	AEMK_One	Within 1 km	0	GMDM06	One	1	Blackish white
AEMK02	AAAAAAAAC	YATGCTGCCTCCCGTAGGAGT	AndhraPradeshCoast	Ethamukkala	Soil	22 Oct. 2017	10	32	8	AEMK02	AEMK_Two	Within 1 km	0	GMDM05	Two	2	Black
AEMK03	AAAAAAATT	YATGCTGCCTCCCGTAGGAGT	AndhraPradeshCoast	Ethamukkala	Soil	22 Oct. 2017	10	32	8	AEMK03	AEMK_Three	Within 1 km	0	GMDM04	Three	3	Black
AEMK04	AAAAAAATC	YATGCTGCCTCCCGTAGGAGT	AndhraPradeshCoast	Ethamukkala	Soil	22 Oct. 2017	10	32	7	AEMK04	AEMK_Four	Within 1 km	0	GMDM03	Four	4	Black
AEMK05	AAAAAAAGT	YATGCTGCCTCCCGTAGGAGT	AndhraPradeshCoast	Ethamukkala	Soil	22 Oct. 2017	10	30	7	AEMK05	AEMK_Five	Within 1 km	0	GMDM02	Five	5	Black
AEMK06	AAAAAAAAG	YATGCTGCCTCCCGTAGGAGT	AndhraPradeshCoast	Ethamukkala	Soil	22 Oct. 2017	10	32	7.5	AEMK06	AEMK_Six	Within 1 km	0	GMDM01	Six	6	Black
AKPT01	AAAAAAAAT	YATGCTGCCTCCCGTAGGAGT	AndhraPradeshCoast	Kothapatnam	Soil	22 Oct. 2017	10	25	8	AKPT01	AKPT_One	Within 1 km	20	GCLG06	Seven	7	Black

I have above columns in metadata file.

I am trying to run command:
qiime longitudinal anova --m-metadata-file sample-metadata.tsv --p-formula SampleNumber+Area --p-sstype II --o-visualization anova.qzv

error is coming: (1/1) Got unexpected extra arguments (/Users/rekadwad Area)

Could you suggest formula and correct command, please?

colinbrislawn · August 14, 2019, 12:10pm

Thank you for sharing your metadata!

I made a mistake with the example I provided. I think the best way to perform this analysis is shown here:
https://docs.qiime2.org/2019.7/tutorials/pd-mice/#diversity-analysis

So they start with
qiime diversity core-metrics-phylogenetic
then move on to
qiime longitudinal anova

Based on this example, you formula might look like
--p-formula 'faith_pd ~ SampleNumber+Area'

Let me know if this new workflow works well for you!
Colin

Bhagwan · August 14, 2019, 5:26pm

Dear Collin,
I appreciate your promptness. Thank you for directing to new workflow.

I performed core-metrics analysis successfully with following command.
qiime diversity core-metrics-phylogenetic
--i-table ./table.qza
--i-phylogeny ./rooted-tree.qza
--m-metadata-file ./sample-metadata.tsv
--p-sampling-depth 300
--output-dir ./core-metrics-results

Later I tried to run command: qiime longitudinal anova
I got error.
Please see following command and error.

(qiime2-2019.7) Bhagwans-MacBook-Pro:anova rekadwad$ qiime longitudinal anova --m-metadata-file core-metrics-results/faith_pd_vector.qza --m-metadata-file sample-metadata.tsv --p-formula 'faith_pd ~ SampleNumber+Area' --p-sstype II --o-visualization core-metrics-results/faiths_pd_anova.qzv

Plugin error from longitudinal:

r_matrix performs f_test for using dimensions that are asymptotically non-normal

Debug info has been saved to /var/folders/1f/dgymh0l92xz6q91mqt0l2y800000gn/T/qiime2-q2cli-err-v69k9a90.log

Please, suggest what to do?

Bhagwan

colinbrislawn · August 14, 2019, 8:54pm

This error is a little cryptic. Before we get into the statistical test and what 'asymptotically non-normal' means, let's take a look at the formula.

Your formula is faith_pd ~ SampleNumber+Area
You could say, "How is faith_pd affected by changes in SampleNumber and Area?"

So... let's take a look at your SampleNumber and Area categories:

SampleNumber: 7 different levels (all levels are different)
Area: 1 level (all levels are the same)

"How do changes in Area effect faith_pd?" We don't know because Area never changes.
"How do changes in SampleNumber effect faith_pd?" SampleNumber is always changing, so I guess faith_pd is always changing too!

Take a look at this metadata table from a small project:

SampleID	Treatment	WeeksAfterInfection
s1	A	1
S2	A	2
s3	A	3
s4	A	4
s5	B	1
s6	B	2
s7	B	3
s8	B	4

Based on this table, we could ask "How do changes in Treatment and Time effect faith_pd" and write this formula:
faith_pd ~ Treatment+WeeksAfterInfection
Because our values are not all the same or all different, we can answer this question.

Which metadata categories would you like to test with your samples?

Bhagwan · August 15, 2019, 7:36am

Yep. You gave correct suggestion. I changed faith_pd column. I got results. Thank you Collin for your help and kind support.

Sincerely, Bhagwan

system · September 15, 2019, 1:39pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.