q2-longitudinal: How to make a proper metadata?

Hi everyone.

I like to run a longitudinal analysis but cannot seem to make it work. My journey onto the plugin is start with the Pairwise difference comparisons.

I want to know how different are my sample throughout their age (weeks, mice).

What modification should I make to my metadata file? Also, in near future, I want to do all the available q2-longitudinal method and wish to have thus robust metadata file for every analysis.

This is my failed sample code.

qiime longitudinal pairwise-differences \
  --m-metadata-file metadata.tsv \
  --m-metadata-file core-metrics-results/shannon_vector.qza \
  --p-metric shannon_entropy \
  --p-group-column genotype \
  --p-state-column age \
  --p-state-1 8 \
  --p-state-2 15 \
  --p-individual-id-column batch \
  --p-replicate-handling random \
  --o-visualization pairwise-differences.qzv

My bits of metadata

SampleID subject group genotype age batch subjectwhen
#q2:types categorical categorical categorical categorical categorical categorical
W01E08 W01 Wt08 Wildtype 8 1 W01Age8
W02E08 W02 Wt08 Wildtype 8 1 W02Age8
W03E08 W03 Wt08 Wildtype 8 1 W03Age8
W04E15 W04 Wt15 Wildtype 15 1 W04Age15
W05E15 W05 Wt15 Wildtype 15 1 W05Age15
W06E15 W06 Wt15 Wildtype 15 1 W06Age15
W01E16 W01 Wt16 Wildtype 16 2 W01Age16
W02E16 W02 Wt16 Wildtype 16 2 W02Age16
W03E16 W03 Wt16 Wildtype 16 2 W03Age16
W04E23 W04 Wt23 Wildtype 23 2 W04Age23
W05E23 W05 Wt23 Wildtype 23 2 W05Age23
W06E23 W06 Wt23 Wildtype 23 2 W06Age23
T01E08 T01 Tg08 Transgenic 8 4 T01Age8
T02E08 T02 Tg08 Transgenic 8 4 T02Age8
T03E08 T03 Tg08 Transgenic 8 4 T03Age8
T04E15 T04 Tg15 Transgenic 15 1 T04Age15
T05E15 T05 Tg15 Transgenic 15 1 T05Age15
T06E15 T06 Tg15 Transgenic 15 1 T06Age15
T01E16 T01 Tg16 Transgenic 16 2 T01Age16
T02E16 T02 Tg16 Transgenic 16 2 T02Age16
T03E16 T03 Tg16 Transgenic 16 2 T03Age16
T04E23 T04 Tg23 Transgenic 23 2 T04Age23
T05E23 T05 Tg23 Transgenic 23 2 T05Age23
T06E23 T06 Tg23 Transgenic 23 2 T06Age23
T07E70 T07 Tg70 Transgenic 70 2 T07Age70
T08E70 T08 Tg70 Transgenic 70 2 T08Age70
T09E70 T09 Tg70 Transgenic 70 2 T09Age70
T10E70 T10 Tg70 Transgenic 70 2 T10Age70
T11E70 T11 Tg70 Transgenic 70 2 T11Age70
T12E70 T12 Tg70 Transgenic 70 2 T12Age70

Thank you everyone

Hello again!
Did you try to run the same command but with subject as individual ID column?
Since you are testing longitudinal data, plugin expect to get ID's of individual subjects, repeatedly samples across the time-points.
If you do not have enough of samples, you also can broad a little bit the age, like grouped ages 1-10, 11-20 and so on (just an example, only you know how to group it or if you need to do it at all).

Hi again! Thank you for your response!

I tried to use subject column, but the .qzv returns blank boxplot and empty pairwise difference tests like the following.

W (wilcoxon signed-rank test) P-value FDR P-value
Group
Transgenic NaN NaN NaN
Wildtype NaN NaN NaN

Above them, there are several warnings like this

No values for subject W05 at age 8
No values for subject W04 at age 8
No values for subject W06 at age 8
etc..

I think warning is relatively okay, right? Since the q2-longitudinal ECAM tutorial .qzv also had them.

Thank you.

It is OK for some of the samples if in overall you have the results with sufficient number of samples.

In theory, your metadata should include following columns: SampleID, SubjectID, TimePoint, Value, Group. Where Value is your tested metric (shannon or any other numerical data, can be in metadata file or in the artifact):

SampleID   SubjectID   TimePoint   Value   Group
Sample1     Subject1   1           0.5     group1
Sample2     Subject1   2           0.3     group1
Sample3     Subject2   1           0.4     group2
Sample4     Subject2   2           0.6     group2
Sample5     Subject3   1           0.5     group1
Sample6     Subject3   2           0.4     group1
Sample7     Subject4   1           0.4     group2
Sample8     Subject4   2           0.6     group2

So, for each subject (individual ID) you should have samples at each timepoint (state), tested repeatedly. Then you can compare the differences between the groups. If subject is presented only in one timepoint (state), it will be excluded from the analysis (No values for subject W05 at age 8).
It is why I suggested to group ages, so each age group will contain more closely 'aged' samples.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.