How to analyse data using ANOVA in qiime2?

I have to perform ANOVA test on my samples. I would like to know to analyse in qiime2. What are the commands and and how one can choose value for --p-formula.
Please see following Docstring in qiim2 tutorial.
Usage: qiime longitudinal anova [OPTIONS]

Perform an ANOVA test on any factors present in a metadata file and/or
metadata-transformable artifacts. This is followed by pairwise t-tests to
examine pairwise differences between categorical sample groups.

Parameters:
–m-metadata-file METADATA…
(multiple Sample metadata containing formula terms.
arguments will
be merged) [required]
–p-formula TEXT R-style formula specifying the model. All terms must
be present in the sample metadata or
metadata-transformable artifacts and can be continuous
or categorical metadata columns. Formulae will be in
the format “a ~ b + c”, where “a” is the metric
(dependent variable) and “b” and “c” are independent
covariates. Use “+” to add a variable; “+ a:b” to add
an interaction between variables a and b; “*” to
include a variable and all interactions; and “-” to
subtract a particular term (e.g., an interaction term).
See
https://patsy.readthedocs.io/en/latest/formulas.html
for full documentation of valid formula operators.
Always enclose formulae in quotes to avoid unpleasant
surprises. [required]
–p-sstype TEXT Choices(‘I’, ‘II’, ‘III’)
Type of sum of squares calculation to perform (I, II,
or III). [default: ‘II’]
Outputs:
–o-visualization VISUALIZATION
[required]

How to construct above formula using R-style formula specifying the model. .
Please help.

Good morning @Bhagwan,

The formula let’s you compare categories in your metadata file. So if you medata looked like this:

sample-id barcode-sequence body-site year month day subject reported-antibiotic-usage days-since-experiment-start
#q2:types categorical categorical numeric numeric numeric categorical categorical numeric
L1S8 AGCTGACTAGTC gut 2008 10 28 subject-1 Yes 0
L1S57 ACACACTATGGC gut 2009 1 20 subject-1 No 84
L1S76 ACTACGTGTGGT gut 2009 2 17 subject-1 No 112
L1S105 AGTGCGATGCGT gut 2009 3 17 subject-1 No 140
L2S155 ACGATGCGACCA left palm 2009 1 20 subject-1 No 84
L2S175 AGCTATCCACGA left palm 2009 2 17 subject-1 No 112
L2S204 ATGCAGCTCAGT left palm 2009 3 17 subject-1 No 140
L2S222 CACGTGACATGT left palm 2009 4 14 subject-1 No 168

Your formula might look like this --p-formula subject+body-site
This will use the ANOVA test to compare the effect of subject number and body site locations.

What columns do you have in your metadata?
Colin

1 Like

Dear Collin, Please have a look at metadata file and command.

SampleID BarcodeSequence LinkerprimerSequence Area Place SampleType Date DepthInCM TemperatureAtTheTimeofSampling pH SampleNumber SampleNumber-1 DistancefromSeashoreInKMS DistancefromStartingPointInKMS SamplingPointParallelTo SerialNumber SerialNumber-1 Description
AEMK01 AAAAAAAAT YATGCTGCCTCCCGTAGGAGT AndhraPradeshCoast Ethamukkala Soil 22 Oct. 2017 10 30 7 AEMK01 AEMK_One Within 1 km 0 GMDM06 One 1 Blackish white
AEMK02 AAAAAAAAC YATGCTGCCTCCCGTAGGAGT AndhraPradeshCoast Ethamukkala Soil 22 Oct. 2017 10 32 8 AEMK02 AEMK_Two Within 1 km 0 GMDM05 Two 2 Black
AEMK03 AAAAAAATT YATGCTGCCTCCCGTAGGAGT AndhraPradeshCoast Ethamukkala Soil 22 Oct. 2017 10 32 8 AEMK03 AEMK_Three Within 1 km 0 GMDM04 Three 3 Black
AEMK04 AAAAAAATC YATGCTGCCTCCCGTAGGAGT AndhraPradeshCoast Ethamukkala Soil 22 Oct. 2017 10 32 7 AEMK04 AEMK_Four Within 1 km 0 GMDM03 Four 4 Black
AEMK05 AAAAAAAGT YATGCTGCCTCCCGTAGGAGT AndhraPradeshCoast Ethamukkala Soil 22 Oct. 2017 10 30 7 AEMK05 AEMK_Five Within 1 km 0 GMDM02 Five 5 Black
AEMK06 AAAAAAAAG YATGCTGCCTCCCGTAGGAGT AndhraPradeshCoast Ethamukkala Soil 22 Oct. 2017 10 32 7.5 AEMK06 AEMK_Six Within 1 km 0 GMDM01 Six 6 Black
AKPT01 AAAAAAAAT YATGCTGCCTCCCGTAGGAGT AndhraPradeshCoast Kothapatnam Soil 22 Oct. 2017 10 25 8 AKPT01 AKPT_One Within 1 km 20 GCLG06 Seven 7 Black

I have above columns in metadata file.

I am trying to run command:
qiime longitudinal anova --m-metadata-file sample-metadata.tsv --p-formula SampleNumber+Area --p-sstype II --o-visualization anova.qzv

error is coming: (1/1) Got unexpected extra arguments (/Users/rekadwad Area)

Could you suggest formula and correct command, please?

Thank you for sharing your metadata!

I made a mistake with the example I provided. I think the best way to perform this analysis is shown here:
https://docs.qiime2.org/2019.7/tutorials/pd-mice/#diversity-analysis

So they start with
qiime diversity core-metrics-phylogenetic
then move on to
qiime longitudinal anova

Based on this example, you formula might look like
`–p-formula ‘faith_pd ~ SampleNumber+Area’

Let me know if this new workflow works well for you!
Colin

1 Like

Dear Collin,
I appreciate your promptness. Thank you for directing to new workflow.

I performed core-metrics analysis successfully with following command.
qiime diversity core-metrics-phylogenetic
–i-table ./table.qza
–i-phylogeny ./rooted-tree.qza
–m-metadata-file ./sample-metadata.tsv
–p-sampling-depth 300
–output-dir ./core-metrics-results

Later I tried to run command: qiime longitudinal anova
I got error.
Please see following command and error.

(qiime2-2019.7) Bhagwans-MacBook-Pro:anova rekadwad$ qiime longitudinal anova --m-metadata-file core-metrics-results/faith_pd_vector.qza --m-metadata-file sample-metadata.tsv --p-formula ‘faith_pd ~ SampleNumber+Area’ --p-sstype II --o-visualization core-metrics-results/faiths_pd_anova.qzv

Plugin error from longitudinal:

r_matrix performs f_test for using dimensions that are asymptotically non-normal

Debug info has been saved to /var/folders/1f/dgymh0l92xz6q91mqt0l2y800000gn/T/qiime2-q2cli-err-v69k9a90.log

Please, suggest what to do?

Bhagwan

This error is a little cryptic. Before we get into the statistical test and what ‘asymptotically non-normal’ means, let’s take a look at the formula.

Your formula is faith_pd ~ SampleNumber+Area
You could say, “How is faith_pd affected by changes in SampleNumber and Area?”

So… let’s take a look at your SampleNumber and Area categories:

  • SampleNumber: 7 different levels (all levels are different)
  • Area: 1 level (all levels are the same)

“How do changes in Area effect faith_pd?” We don’t know because Area never changes.
“How do changes in SampleNumber effect faith_pd?” SampleNumber is always changing, so I guess faith_pd is always changing too! :man_shrugging:


Take a look at this metadata table from a small project:

SampleID Treatment WeeksAfterInfection
s1 A 1
S2 A 2
s3 A 3
s4 A 4
s5 B 1
s6 B 2
s7 B 3
s8 B 4

Based on this table, we could ask “How do changes in Treatment and Time effect faith_pd” and write this formula:
faith_pd ~ Treatment+WeeksAfterInfection
Because our values are not all the same or all different, we can answer this question.


Which metadata categories would you like to test with your samples?

1 Like

Yep. You gave correct suggestion. I changed faith_pd column. I got results. Thank you Collin for your help and kind support.

Sincerely, Bhagwan

1 Like