q2-longitudinal: How to make a proper metadata?

Diki · September 29, 2021, 7:35am

Hi everyone.

I like to run a longitudinal analysis but cannot seem to make it work. My journey onto the plugin is start with the Pairwise difference comparisons.

I want to know how different are my sample throughout their age (weeks, mice).

What modification should I make to my metadata file? Also, in near future, I want to do all the available q2-longitudinal method and wish to have thus robust metadata file for every analysis.

This is my failed sample code.

qiime longitudinal pairwise-differences \
  --m-metadata-file metadata.tsv \
  --m-metadata-file core-metrics-results/shannon_vector.qza \
  --p-metric shannon_entropy \
  --p-group-column genotype \
  --p-state-column age \
  --p-state-1 8 \
  --p-state-2 15 \
  --p-individual-id-column batch \
  --p-replicate-handling random \
  --o-visualization pairwise-differences.qzv

My bits of metadata

SampleID	subject	group	genotype	age	batch	subjectwhen
#q2:types	categorical	categorical	categorical	categorical	categorical	categorical
W01E08	W01	Wt08	Wildtype	8	1	W01Age8
W02E08	W02	Wt08	Wildtype	8	1	W02Age8
W03E08	W03	Wt08	Wildtype	8	1	W03Age8
W04E15	W04	Wt15	Wildtype	15	1	W04Age15
W05E15	W05	Wt15	Wildtype	15	1	W05Age15
W06E15	W06	Wt15	Wildtype	15	1	W06Age15
W01E16	W01	Wt16	Wildtype	16	2	W01Age16
W02E16	W02	Wt16	Wildtype	16	2	W02Age16
W03E16	W03	Wt16	Wildtype	16	2	W03Age16
W04E23	W04	Wt23	Wildtype	23	2	W04Age23
W05E23	W05	Wt23	Wildtype	23	2	W05Age23
W06E23	W06	Wt23	Wildtype	23	2	W06Age23
T01E08	T01	Tg08	Transgenic	8	4	T01Age8
T02E08	T02	Tg08	Transgenic	8	4	T02Age8
T03E08	T03	Tg08	Transgenic	8	4	T03Age8
T04E15	T04	Tg15	Transgenic	15	1	T04Age15
T05E15	T05	Tg15	Transgenic	15	1	T05Age15
T06E15	T06	Tg15	Transgenic	15	1	T06Age15
T01E16	T01	Tg16	Transgenic	16	2	T01Age16
T02E16	T02	Tg16	Transgenic	16	2	T02Age16
T03E16	T03	Tg16	Transgenic	16	2	T03Age16
T04E23	T04	Tg23	Transgenic	23	2	T04Age23
T05E23	T05	Tg23	Transgenic	23	2	T05Age23
T06E23	T06	Tg23	Transgenic	23	2	T06Age23
T07E70	T07	Tg70	Transgenic	70	2	T07Age70
T08E70	T08	Tg70	Transgenic	70	2	T08Age70
T09E70	T09	Tg70	Transgenic	70	2	T09Age70
T10E70	T10	Tg70	Transgenic	70	2	T10Age70
T11E70	T11	Tg70	Transgenic	70	2	T11Age70
T12E70	T12	Tg70	Transgenic	70	2	T12Age70

Thank you everyone

timanix · September 29, 2021, 8:02am

Hello again!
Did you try to run the same command but with subject as individual ID column?
Since you are testing longitudinal data, plugin expect to get ID's of individual subjects, repeatedly samples across the time-points.
If you do not have enough of samples, you also can broad a little bit the age, like grouped ages 1-10, 11-20 and so on (just an example, only you know how to group it or if you need to do it at all).

Diki · September 30, 2021, 6:22am

Hi again! Thank you for your response!

I tried to use subject column, but the .qzv returns blank boxplot and empty pairwise difference tests like the following.

	W (wilcoxon signed-rank test)	P-value	FDR P-value
Group
Transgenic	NaN	NaN	NaN
Wildtype	NaN	NaN	NaN

Above them, there are several warnings like this

No values for subject W05 at age 8
No values for subject W04 at age 8
No values for subject W06 at age 8
etc..

I think warning is relatively okay, right? Since the q2-longitudinal ECAM tutorial .qzv also had them.

Thank you.

timanix · September 30, 2021, 7:25am

It is OK for some of the samples if in overall you have the results with sufficient number of samples.

In theory, your metadata should include following columns: SampleID, SubjectID, TimePoint, Value, Group. Where Value is your tested metric (shannon or any other numerical data, can be in metadata file or in the artifact):

SampleID   SubjectID   TimePoint   Value   Group
Sample1     Subject1   1           0.5     group1
Sample2     Subject1   2           0.3     group1
Sample3     Subject2   1           0.4     group2
Sample4     Subject2   2           0.6     group2
Sample5     Subject3   1           0.5     group1
Sample6     Subject3   2           0.4     group1
Sample7     Subject4   1           0.4     group2
Sample8     Subject4   2           0.6     group2

So, for each subject (individual ID) you should have samples at each timepoint (state), tested repeatedly. Then you can compare the differences between the groups. If subject is presented only in one timepoint (state), it will be excluded from the analysis (No values for subject W05 at age 8).
It is why I suggested to group ages, so each age group will contain more closely 'aged' samples.

system · October 31, 2021, 1:26pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.