q2-longitudinal pairwise distances for a dataset with unevenly spaced time points

Hello there,

I have a longitudinal dataset with samples collected from multiple sites and unevenly spaced time points. For example, some of the subjects have only one sample per site, some have replicates for a single time point per site, and some have unique time points.

When I ran the following code:
qiime longitudinal first-distances
--i-distance-matrix $inputDir/$inputTable/$inputTable-Distance.qza
--m-metadata-file $metadata
--p-state-column timepoint_allstays
--p-individual-id-column host_subject_id
--p-replicate-handling random
--o-first-distances $outDir/$inputTable-first-distances.qza
qiime longitudinal linear-mixed-effects
--m-metadata-file $outDir/$inputTable-first-distances.qza
--m-metadata-file $metadata
--p-metric Distance
--p-state-column timepoint_allstays
--p-individual-id-column host_subject_id
--p-group-columns sample_sarscov2_screening_result
--o-visualization $outDir/$inputTable-first-distances-LME.qzv

The output file had distances calculated only for 4 subjects. FirstDifferences.tsv (184 Bytes)
And the LME command did not work at all due to insufficient input rows.

I am aware that the command cannot handle duplicates. Hence, I was trying to add ranks as a new time point column. But I am stuck on how to deal with subjects like H5 and H6 as shown in the image above. Could you suggest how to calculate longitudinal distances per site with this kind of irregular time series dataset?

Please let me know if I could provide more info.


Hi @prdas,

I move this to user support, since it falls into that category more than a technical issue.

This is complex data! :slight_smile:

My first suggestion would be to separate your body sites, I think it will make subsequence modeling easier.

I’m not sure if QIIME can do this, but I might specifically filter to your duplicated timepoints, and look at whether the within-duplicate distances are smaller than the between timepoint differences for those individuals. If they’re small, i would drop duplicates.

I think the ranking approach is okay; if there are timepoints you can fudge or ranges you can define, that might help. Like, 5-9 days might be classified as 7 for the sake of convenience and modeling. That also lets you have samples with missing timepoints. Depending on your sample size, I would also consider how you’re defining your baseline.

You could also try testing cross sectional timepoints.


Hey Justine,

Thanks for your suggestions :slight_smile:

Yes, I did separate feature tables as per the body site.
For duplicate samples, I have your suggestion.

But could you also confirm if H5-kinda samples that have only a single time point would get ignored during calculation?

Re to defining baseline - I have not chosen a baseline as mentioned in the command. I assume, in that case, the pairwise distances are calculated between all available time points? If not, could you tell me how?

I am also wondering - When does one choose to define baseline and not define baseline?

Hi @prdas

Yes, they will be skipped beccause a distance between timepoints requires two timepoints.

“First distances” measures from a baseline timepoint (the first timepoint), so thinking about how you define this is important. The pairwise-differences command would look at th change between timepoints.

It depends on your hypothesis: do you think you have a change from baseline, or a change in step length?



Hey Justine,

Thanks again for answering clearly.

Re: It depends on your hypothesis: do you think you have a change from baseline or a change in step length?
I guess I have to try both to see if there are any differences worth the interpretation.

Hi @prdas,

Good luck! Let us know if you run into more snags.


1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.