ctf - linear mixed model

I have another question for the forum :slight_smile:

I was trying to follow this ctf tutorial

I wanted to do this step " There is not a strong chnage over time in this example. However, we could explore the distance_matrix to test the differences by IBD by looking at pairwise distances with a Mixed Effects Model. How to use and evaluate the q2-longitudinal commands is covered in depth in their tutorial here ."

However, I'm struggling with what to put in the command for the mixed effects model command. The command in the LME tutorial is below, but I don't know how to change it to make it make sense with a distance matrix.

qiime longitudinal linear-mixed-effects \
  --m-metadata-file ecam-sample-metadata.tsv \  #metadata
  --m-metadata-file shannon.qza \ #Distance matrix? 
  --p-metric shannon \ #? 
  --p-group-columns delivery,diet,sex \ # ok
  --p-state-column month \ # ok
  --p-individual-id-column studyid \ # ok
  --o-visualization linear-mixed-effects.qzv

I put question marks next to the parameters that I do not know what to put for.
Do I need to manipulate the distance matrix in any way prior to putting it in here?
Thanks so much for any help you can offer!


1 Like

Hi @clairewill22 ,
It looks like the CTF tutorial gives some examples for using the outputs with q2-longitudinal further down in the tutorial, and using a distance matrix is not one of these.

Specifically, it looks like the sample trajectory artifact can be used as a metadata input (for volatility) so the same could probably be done with linear-mixed-effects. The PCA coordinates could also be used in the same way... and it looks like there is a specific example of using some downstream results in linear-mixed-effects near the end of that tutorial.

This would be any of those outputs mentioned above (see the CTF tutorial for more details)

This depends on what you use as input. Any of these inputs can be viewed with qiime metadata tabulate, so you can look at the actual column names to use as input.

It looks like the CTF tutorial does not describe how to use the distance matrix... the recommendation seems to be the sample trajectory. If you do want to use a distance matrix, though, you could use the first-distances action in q2-longitudinal to look at the interval distances within each subject.

I hope that gets you on the right track! I can't really answer specifics about CTF, but if you have additional questions just let us know!


Great! Thanks, @Nicholas_Bokulich, this is helpful.
So I decided that I do want to use the "SampleData[SampleTrajectory]" file for the LME, but I tried to visualize this using the metadata tabulate command to find the name of the column and I get the following error:

There was an issue with loading the file state_subject_ordination.qza as metadata:

  Metadata file must be encoded as UTF-8 or ASCII. The following error occurred when decoding the file:

  'utf-8' codec can't decode byte 0xfe in position 14: invalid start byte

  There may be more errors present in the metadata file. To get a full report, sample/feature metadata files can be validated with Keemei: https://keemei.qiime2.org

  Find details on QIIME 2 metadata requirements here: https://docs.qiime2.org/2021.4/tutorials/metadata/

I also took a shot in the dark and plugged it into the linear-mixed-effects command specifying the metric as trajectory and also got an error related to encoding saying that there were non-ASCII characters.

here's the command I ran:

qiime longitudinal linear-mixed-effects \
–m-metadata-file ../ctf-metadata-all.tsv \
–m-metadata-file state_subject_ordination.qza \
–p-metric trajectory \
–p-state-column sequence \
–p-individual-id-column Animal \
–p-group-columns species \
–o-visualization state_subject_lme.qzv

and the error:

Error: Detected invalid character in: –m-metadata-file, –m-metadata-file, –p-metric, –p-state-column, –p-individual-id-column, –p-group-columns, –o-visualization

Verify the correct quotes or dashes (ASCII) are being used.

Any thoughts on this would be appreciated! Is it an issue with the way the file is being output in the underlying plug in? or something I'm doing wrong. I guess I can extract the data from the .qza file if it comes to that and run the LME model in R... but QIIME is always a nice way to do it!

Thanks so much, as always! :slight_smile:

1 Like

Hi @clairewill22 ,

Looks like these are two (probably) unrelated issues:

It sounds like an invalid character is in the output file. This might be a quirk of the data you are feeding in, or it might be an issue with the CTF plugin, I am not sure. Maybe the developer of CTF @cmartino can advise? He will probably need to inspect your QZA if you do not mind sharing it here.

The LME error sounds different:

This sounds like it has nothing to do with the inputs, merely the command... my guess is that you copied/pasted from the forum and an invisible character got carried along with it (this can happen when copying from from a website), or there is just a typo. Try typing out the command directly.

However, since you are using the same file that could not validate above:

I suspect that the same metadata validation error might occur here if there really is an issue with that file.

That might not work either if the invalid character gets carried over too... so I'd advise sticking with debugging instead of moving on.

But you could also try LME on one of the other outputs, like the PCA coordinates, if only to feel like something is working! :joy:

Hi @clairewill22 & @Nicholas_Bokulich,

First, thanks for using CTF and posting the question. I can definitely understand the confusion. The sentence in the tutorial is not clear. I am working on updating the tutorials but have not merged them yet. What I meant by that sentence is that the q2-longitudinal linear-mixed-effects on first-distances on the distance matrix on the CTF distance_matrix might help explore global change over time. This would look like the following in the tutorial:

# Step1: Generate first-distances (example distance from final)
qiime longitudinal first-distances \
  --i-distance-matrix IBD-2538/ctf-results/distance_matrix.qza \
  --m-metadata-file IBD-2538/data/metadata.tsv \
  --p-state-column timepoint \
  --p-individual-id-column host_subject_id \
  --p-replicate-handling drop \
  --p-baseline 25 \
  --o-first-distances IBD-2538/ctf-results/first-distances.qza

# Step 2: Run LME on first-distance output
qiime longitudinal linear-mixed-effects \
  --m-metadata-file IBD-2538/ctf-results/first-distances.qza \
  --m-metadata-file IBD-2538/data/metadata.tsv \
  --p-metric Distance \
  --p-state-column timepoint \
  --p-individual-id-column host_subject_id \
  --p-group-columns ibd\
  --p-formula "Distance ~  timepoint * ibd" \
  --o-visualization  IBD-2538/ctf-results/first-distances-LME.qzv

#Step 3: Optional volatility plot of the first distances
qiime longitudinal volatility \
  --m-metadata-file IBD-2538/data/metadata.tsv \
  --m-metadata-file IBD-2538/ctf-results/first-distances.qza \
  --p-default-metric Distance \
  --p-default-group-column ibd \
  --p-state-column timepoint \
  --p-individual-id-column host_subject_id \
  --o-visualization IBD-2538/ctf-results/volatility.qzv

Input : distance_matrix.qza (2.4 MB) metadata.tsv (1.1 MB)

Output : first-distances.qza (588.3 KB) first-distances-LME.qzv (898.9 KB) volatility.qzv (1.0 MB)

But maybe a another/better way to do this would be with the pairwise-distances command like so:

qiime longitudinal pairwise-distances \
  --i-distance-matrix IBD-2538/ctf-results/distance_matrix.qza \
  --m-metadata-file IBD-2538/data/metadata.tsv \
  --p-group-column ibd \
  --p-state-column timepoint \
  --p-state-1 1 \
  --p-state-2 12 \
  --p-individual-id-column host_subject_id \
  --p-replicate-handling drop \
  --o-visualization IBD-2538/ctf-results/pairwise-distances.qzv

Output : pairwise-distances.qzv (784.3 KB)

The SampleData[SampleTrajectory] could also be input into the LME (see command below). The column used for --p-individual-id-column will always be re-named to subject_id in the output state_subject_ordination.qza. This is because CTF will sum all the sample x feature counts in the subject x timepoint pairs and we wanted to avoid conflicts with the original metadata. I think your particular error might be because you also included the metadata in the command, which might cause an error since the state_subject_ordination.qza already contains all the metadata columns. So you only need to provide that one file like so:

qiime longitudinal linear-mixed-effects \
  --m-metadata-file IBD-2538/ctf-results/state_subject_ordination.qza \
  --p-metric "PC1" \
  --p-state-column "timepoint" \
  --p-individual-id-column "subject_id" \
  --p-group-columns "ibd"\
  --p-formula "PC1 ~  timepoint * ibd" \
  --o-visualization  IBD-2538/ctf-results/state-subject-ordination-LME.qzv

Output : state-subject-ordination-LME.qzv (905.4 KB)

I did not suggest this or include it in the paper/tutorials because I am unsure of the validity of LME on PC values. If you have some reasoning or literature on the validity, I would be very interested in it. In the end, I primarily stuck with the log-ratios of aggregate features for the LME models and statistics.

I hope this helps and if you have more questions let me know.