I have a longitudinal dataset with samples collected from multiple sites and unevenly spaced time points. For example, some of the subjects have only one sample per site, some have replicates for a single time point per site, and some have unique time points.
When I ran the following code:
qiime longitudinal first-distances
qiime longitudinal linear-mixed-effects
The output file had distances calculated only for 4 subjects. FirstDifferences.tsv (184 Bytes)
And the LME command did not work at all due to insufficient input rows.
I am aware that the command cannot handle duplicates. Hence, I was trying to add ranks as a new time point column. But I am stuck on how to deal with subjects like H5 and H6 as shown in the image above. Could you suggest how to calculate longitudinal distances per site with this kind of irregular time series dataset?
I move this to user support, since it falls into that category more than a technical issue.
This is complex data!
My first suggestion would be to separate your body sites, I think it will make subsequence modeling easier.
I’m not sure if QIIME can do this, but I might specifically filter to your duplicated timepoints, and look at whether the within-duplicate distances are smaller than the between timepoint differences for those individuals. If they’re small, i would drop duplicates.
I think the ranking approach is okay; if there are timepoints you can fudge or ranges you can define, that might help. Like, 5-9 days might be classified as 7 for the sake of convenience and modeling. That also lets you have samples with missing timepoints. Depending on your sample size, I would also consider how you’re defining your baseline.
You could also try testing cross sectional timepoints.
Yes, I did separate feature tables as per the body site.
For duplicate samples, I have your suggestion.
But could you also confirm if H5-kinda samples that have only a single time point would get ignored during calculation?
Re to defining baseline - I have not chosen a baseline as mentioned in the command. I assume, in that case, the pairwise distances are calculated between all available time points? If not, could you tell me how?
I am also wondering - When does one choose to define baseline and not define baseline?
Yes, they will be skipped beccause a distance between timepoints requires two timepoints.
“First distances” measures from a baseline timepoint (the first timepoint), so thinking about how you define this is important. The pairwise-differences command would look at th change between timepoints.
It depends on your hypothesis: do you think you have a change from baseline, or a change in step length?