I have to perform ANOVA test on my samples. I would like to know to analyse in qiime2. What are the commands and and how one can choose value for --p-formula.
Please see following Docstring in qiim2 tutorial.
Usage: qiime longitudinal anova [OPTIONS]
Perform an ANOVA test on any factors present in a metadata file and/or
metadata-transformable artifacts. This is followed by pairwise t-tests to
examine pairwise differences between categorical sample groups.
Parameters:
--m-metadata-file METADATA...
(multiple Sample metadata containing formula terms.
arguments will
be merged) [required]
--p-formula TEXT R-style formula specifying the model. All terms must
be present in the sample metadata or
metadata-transformable artifacts and can be continuous
or categorical metadata columns. Formulae will be in
the format "a ~ b + c", where "a" is the metric
(dependent variable) and "b" and "c" are independent
covariates. Use "+" to add a variable; "+ a:b" to add
an interaction between variables a and b; "*" to
include a variable and all interactions; and "-" to
subtract a particular term (e.g., an interaction term).
See How formulas work — patsy 0.5.1+dev documentation
for full documentation of valid formula operators.
Always enclose formulae in quotes to avoid unpleasant
surprises. [required]
--p-sstype TEXT Choices('I', 'II', 'III')
Type of sum of squares calculation to perform (I, II,
or III). [default: 'II']
Outputs:
--o-visualization VISUALIZATION
[required]
How to construct above formula using R-style formula specifying the model. .
Please help.
The formula let's you compare categories in your metadata file. So if you medata looked like this:
sample-id
barcode-sequence
body-site
year
month
day
subject
reported-antibiotic-usage
days-since-experiment-start
#q2:types
categorical
categorical
numeric
numeric
numeric
categorical
categorical
numeric
L1S8
AGCTGACTAGTC
gut
2008
10
28
subject-1
Yes
0
L1S57
ACACACTATGGC
gut
2009
1
20
subject-1
No
84
L1S76
ACTACGTGTGGT
gut
2009
2
17
subject-1
No
112
L1S105
AGTGCGATGCGT
gut
2009
3
17
subject-1
No
140
L2S155
ACGATGCGACCA
left palm
2009
1
20
subject-1
No
84
L2S175
AGCTATCCACGA
left palm
2009
2
17
subject-1
No
112
L2S204
ATGCAGCTCAGT
left palm
2009
3
17
subject-1
No
140
L2S222
CACGTGACATGT
left palm
2009
4
14
subject-1
No
168
Your formula might look like this --p-formula subject+body-site
This will use the ANOVA test to compare the effect of subject number and body site locations.
Dear Collin, Please have a look at metadata file and command.
SampleID
BarcodeSequence
LinkerprimerSequence
Area
Place
SampleType
Date
DepthInCM
TemperatureAtTheTimeofSampling
pH
SampleNumber
SampleNumber-1
DistancefromSeashoreInKMS
DistancefromStartingPointInKMS
SamplingPointParallelTo
SerialNumber
SerialNumber-1
Description
AEMK01
AAAAAAAAT
YATGCTGCCTCCCGTAGGAGT
AndhraPradeshCoast
Ethamukkala
Soil
22 Oct. 2017
10
30
7
AEMK01
AEMK_One
Within 1 km
0
GMDM06
One
1
Blackish white
AEMK02
AAAAAAAAC
YATGCTGCCTCCCGTAGGAGT
AndhraPradeshCoast
Ethamukkala
Soil
22 Oct. 2017
10
32
8
AEMK02
AEMK_Two
Within 1 km
0
GMDM05
Two
2
Black
AEMK03
AAAAAAATT
YATGCTGCCTCCCGTAGGAGT
AndhraPradeshCoast
Ethamukkala
Soil
22 Oct. 2017
10
32
8
AEMK03
AEMK_Three
Within 1 km
0
GMDM04
Three
3
Black
AEMK04
AAAAAAATC
YATGCTGCCTCCCGTAGGAGT
AndhraPradeshCoast
Ethamukkala
Soil
22 Oct. 2017
10
32
7
AEMK04
AEMK_Four
Within 1 km
0
GMDM03
Four
4
Black
AEMK05
AAAAAAAGT
YATGCTGCCTCCCGTAGGAGT
AndhraPradeshCoast
Ethamukkala
Soil
22 Oct. 2017
10
30
7
AEMK05
AEMK_Five
Within 1 km
0
GMDM02
Five
5
Black
AEMK06
AAAAAAAAG
YATGCTGCCTCCCGTAGGAGT
AndhraPradeshCoast
Ethamukkala
Soil
22 Oct. 2017
10
32
7.5
AEMK06
AEMK_Six
Within 1 km
0
GMDM01
Six
6
Black
AKPT01
AAAAAAAAT
YATGCTGCCTCCCGTAGGAGT
AndhraPradeshCoast
Kothapatnam
Soil
22 Oct. 2017
10
25
8
AKPT01
AKPT_One
Within 1 km
20
GCLG06
Seven
7
Black
I have above columns in metadata file.
I am trying to run command:
qiime longitudinal anova --m-metadata-file sample-metadata.tsv --p-formula SampleNumber+Area --p-sstype II --o-visualization anova.qzv
error is coming: (1/1) Got unexpected extra arguments (/Users/rekadwad Area)
Could you suggest formula and correct command, please?
This error is a little cryptic. Before we get into the statistical test and what 'asymptotically non-normal' means, let's take a look at the formula.
Your formula is faith_pd ~ SampleNumber+Area
You could say, "How is faith_pd affected by changes in SampleNumber and Area?"
So... let's take a look at your SampleNumber and Area categories:
SampleNumber: 7 different levels (all levels are different)
Area: 1 level (all levels are the same)
"How do changes in Area effect faith_pd?" We don't know because Area never changes.
"How do changes in SampleNumber effect faith_pd?" SampleNumber is always changing, so I guess faith_pd is always changing too!
Take a look at this metadata table from a small project:
SampleID
Treatment
WeeksAfterInfection
s1
A
1
S2
A
2
s3
A
3
s4
A
4
s5
B
1
s6
B
2
s7
B
3
s8
B
4
Based on this table, we could ask "How do changes in Treatment and Time effect faith_pd" and write this formula: faith_pd ~ Treatment+WeeksAfterInfection
Because our values are not all the same or all different, we can answer this question.
Which metadata categories would you like to test with your samples?