How to analyse data using ANOVA in qiime2?

I have to perform ANOVA test on my samples. I would like to know to analyse in qiime2. What are the commands and and how one can choose value for --p-formula.
Please see following Docstring in qiim2 tutorial.
Usage: qiime longitudinal anova [OPTIONS]

Perform an ANOVA test on any factors present in a metadata file and/or
metadata-transformable artifacts. This is followed by pairwise t-tests to
examine pairwise differences between categorical sample groups.

–m-metadata-file METADATA…
(multiple Sample metadata containing formula terms.
arguments will
be merged) [required]
–p-formula TEXT R-style formula specifying the model. All terms must
be present in the sample metadata or
metadata-transformable artifacts and can be continuous
or categorical metadata columns. Formulae will be in
the format “a ~ b + c”, where “a” is the metric
(dependent variable) and “b” and “c” are independent
covariates. Use “+” to add a variable; “+ a:b” to add
an interaction between variables a and b; “*” to
include a variable and all interactions; and “-” to
subtract a particular term (e.g., an interaction term).
for full documentation of valid formula operators.
Always enclose formulae in quotes to avoid unpleasant
surprises. [required]
–p-sstype TEXT Choices(‘I’, ‘II’, ‘III’)
Type of sum of squares calculation to perform (I, II,
or III). [default: ‘II’]
–o-visualization VISUALIZATION

How to construct above formula using R-style formula specifying the model. .
Please help.

Good morning @Bhagwan,

The formula let’s you compare categories in your metadata file. So if you medata looked like this:

sample-id barcode-sequence body-site year month day subject reported-antibiotic-usage days-since-experiment-start
#q2:types categorical categorical numeric numeric numeric categorical categorical numeric
L1S8 AGCTGACTAGTC gut 2008 10 28 subject-1 Yes 0
L1S57 ACACACTATGGC gut 2009 1 20 subject-1 No 84
L1S76 ACTACGTGTGGT gut 2009 2 17 subject-1 No 112
L1S105 AGTGCGATGCGT gut 2009 3 17 subject-1 No 140
L2S155 ACGATGCGACCA left palm 2009 1 20 subject-1 No 84
L2S175 AGCTATCCACGA left palm 2009 2 17 subject-1 No 112
L2S204 ATGCAGCTCAGT left palm 2009 3 17 subject-1 No 140
L2S222 CACGTGACATGT left palm 2009 4 14 subject-1 No 168

Your formula might look like this --p-formula subject+body-site
This will use the ANOVA test to compare the effect of subject number and body site locations.

What columns do you have in your metadata?

1 Like

Dear Collin, Please have a look at metadata file and command.

SampleID BarcodeSequence LinkerprimerSequence Area Place SampleType Date DepthInCM TemperatureAtTheTimeofSampling pH SampleNumber SampleNumber-1 DistancefromSeashoreInKMS DistancefromStartingPointInKMS SamplingPointParallelTo SerialNumber SerialNumber-1 Description
AEMK01 AAAAAAAAT YATGCTGCCTCCCGTAGGAGT AndhraPradeshCoast Ethamukkala Soil 22 Oct. 2017 10 30 7 AEMK01 AEMK_One Within 1 km 0 GMDM06 One 1 Blackish white
AEMK02 AAAAAAAAC YATGCTGCCTCCCGTAGGAGT AndhraPradeshCoast Ethamukkala Soil 22 Oct. 2017 10 32 8 AEMK02 AEMK_Two Within 1 km 0 GMDM05 Two 2 Black
AEMK03 AAAAAAATT YATGCTGCCTCCCGTAGGAGT AndhraPradeshCoast Ethamukkala Soil 22 Oct. 2017 10 32 8 AEMK03 AEMK_Three Within 1 km 0 GMDM04 Three 3 Black
AEMK04 AAAAAAATC YATGCTGCCTCCCGTAGGAGT AndhraPradeshCoast Ethamukkala Soil 22 Oct. 2017 10 32 7 AEMK04 AEMK_Four Within 1 km 0 GMDM03 Four 4 Black
AEMK05 AAAAAAAGT YATGCTGCCTCCCGTAGGAGT AndhraPradeshCoast Ethamukkala Soil 22 Oct. 2017 10 30 7 AEMK05 AEMK_Five Within 1 km 0 GMDM02 Five 5 Black
AEMK06 AAAAAAAAG YATGCTGCCTCCCGTAGGAGT AndhraPradeshCoast Ethamukkala Soil 22 Oct. 2017 10 32 7.5 AEMK06 AEMK_Six Within 1 km 0 GMDM01 Six 6 Black
AKPT01 AAAAAAAAT YATGCTGCCTCCCGTAGGAGT AndhraPradeshCoast Kothapatnam Soil 22 Oct. 2017 10 25 8 AKPT01 AKPT_One Within 1 km 20 GCLG06 Seven 7 Black

I have above columns in metadata file.

I am trying to run command:
qiime longitudinal anova --m-metadata-file sample-metadata.tsv --p-formula SampleNumber+Area --p-sstype II --o-visualization anova.qzv

error is coming: (1/1) Got unexpected extra arguments (/Users/rekadwad Area)

Could you suggest formula and correct command, please?

Thank you for sharing your metadata!

I made a mistake with the example I provided. I think the best way to perform this analysis is shown here:

So they start with
qiime diversity core-metrics-phylogenetic
then move on to
qiime longitudinal anova

Based on this example, you formula might look like
--p-formula 'faith_pd ~ SampleNumber+Area'

Let me know if this new workflow works well for you!

1 Like

Dear Collin,
I appreciate your promptness. Thank you for directing to new workflow.

I performed core-metrics analysis successfully with following command.
qiime diversity core-metrics-phylogenetic
–i-table ./table.qza
–i-phylogeny ./rooted-tree.qza
–m-metadata-file ./sample-metadata.tsv
–p-sampling-depth 300
–output-dir ./core-metrics-results

Later I tried to run command: qiime longitudinal anova
I got error.
Please see following command and error.

(qiime2-2019.7) Bhagwans-MacBook-Pro:anova rekadwad$ qiime longitudinal anova --m-metadata-file core-metrics-results/faith_pd_vector.qza --m-metadata-file sample-metadata.tsv --p-formula ‘faith_pd ~ SampleNumber+Area’ --p-sstype II --o-visualization core-metrics-results/faiths_pd_anova.qzv

Plugin error from longitudinal:

r_matrix performs f_test for using dimensions that are asymptotically non-normal

Debug info has been saved to /var/folders/1f/dgymh0l92xz6q91mqt0l2y800000gn/T/qiime2-q2cli-err-v69k9a90.log

Please, suggest what to do?


This error is a little cryptic. Before we get into the statistical test and what ‘asymptotically non-normal’ means, let’s take a look at the formula.

Your formula is faith_pd ~ SampleNumber+Area
You could say, “How is faith_pd affected by changes in SampleNumber and Area?”

So… let’s take a look at your SampleNumber and Area categories:

  • SampleNumber: 7 different levels (all levels are different)
  • Area: 1 level (all levels are the same)

“How do changes in Area effect faith_pd?” We don’t know because Area never changes.
“How do changes in SampleNumber effect faith_pd?” SampleNumber is always changing, so I guess faith_pd is always changing too! :man_shrugging:

Take a look at this metadata table from a small project:

SampleID Treatment WeeksAfterInfection
s1 A 1
S2 A 2
s3 A 3
s4 A 4
s5 B 1
s6 B 2
s7 B 3
s8 B 4

Based on this table, we could ask “How do changes in Treatment and Time effect faith_pd” and write this formula:
faith_pd ~ Treatment+WeeksAfterInfection
Because our values are not all the same or all different, we can answer this question.

Which metadata categories would you like to test with your samples?


Yep. You gave correct suggestion. I changed faith_pd column. I got results. Thank you Collin for your help and kind support.

Sincerely, Bhagwan


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.