pairwise differences compare to single baseline?

gondry2 · June 12, 2020, 7:30pm

Hi there,

I am using QIIME2/2020.2. I have a dataset that is from a randomized controlled cross-over trial in which each subject completed all three interventions (low-iron diet, high-iron diet, and Mediterranean diet) . I have sequencing data from before and after each intervention. I was able to use longitudinal pairwise differences to compare alpha and beta diversity between the three interventions based on differences pre- and post- each intervention (ie. difference pre- and post- low-iron diet). However, my PI would like me to compare diversity changes between our interventions when compared to each subjects original baseline. Thus I removed the pre-intervention data from the feature frequency table and am now trying to run pairwise differences between state 0 and state 1 (post each intervention). I can't get this to run, and have tried longitudinal first-differences with no luck. Not sure if I need to reformat my mapping file in a different way or just run each intervention by itself, and then try downstream analysis from there? Any suggestions would be most appreciated.

Code:

#!/bin/bash
# ----------------SLURM Parameters----------------
#SBATCH -p normal
#SBATCH -n 1
#SBATCH --mem=15g
#SBATCH -N 1
# ----------------Load Modules--------------------
module load QIIME2
# ----------------Commands------------------------
qiime longitudinal pairwise-differences \
  --m-metadata-file Lisa_mapping_file_061020.tsv  \
  --m-metadata-file iron-core-metrics-results_bydiet/shannon_vector.qza \
  --p-metric shannon \
  --p-group-column Diet \
  --p-state-column State \
  --p-state-1 0 \
  --p-state-2 1 \
  --p-individual-id-column SubjectID \
  --p-replicate-handling random \
  --o-visualization pairwise-differences_shannon.qzv

Error: 
Plugin error from longitudinal:

  State 0.0 is not represented by any members of HighIron group in metadata. Consider using a different group_column or state value.

Debug info has been saved to /tmp/qiime2-q2cli-err-kl5nvhke.log

Portion of metadata file:

Nicholas_Bokulich · June 12, 2020, 8:21pm

Welcome back, @gondry2

I think the issue is the cross-over design, but you can fix that. Let me explain.

pairwise-differences works by comparing the state 1 to the state 0 values for each individual, and then assigning those individuals to a group. When there are replicates, you need to determine how to handle these... using --p-replicate-handling random as you have, one is chosen at random from the replicate groups.

This creates a problem in a crossover experiment, though, since you are trying to compare multiple timepoints, effectively, against multiple baselines for a single subject... so all subjects have "replicates" for those "states" based on how you are labeling these. To word it more simply: pairwise-differences was designed to compare differences between subjects, not multiple timepoints within a single subject.

But you want to compare multiple timepoints within a single subject as well as between subjects. So what you could do (but you should think carefully about this to make sure that it is doing the comparison that you want to see) is jostle the metadata a little bit: make a new metadata column that is a concatenation of SubjectID and DietDef, and use that as your --p-individual-id-column. That way you will be looking at pre- vs. post-intervention for each intervention separately, across all individuals. The tests will tell you (a) did the intervention have an effect (for each intervention separately) and (b) was the magnitude of effect the same or different (comparing between interventions).

Give that a try and let us know what you see!

gondry2 · June 12, 2020, 8:35pm

Amazing explanation! Thanks very much. I will play around with it this weekend and let you know how it turns out.

gondry2 · June 12, 2020, 9:22pm

Ok, so I thought I would update my core diversity metrics using the new metadata file. I am otherwise using the same files and the same code, but I am receiving this error:

(1/2) Invalid value for "--i-phylogeny": 'rooted-tree.qza' is not a QIIME 2
Artifact (.qza)
(2/2) Invalid value for "--i-table": 'iron_feature-frequency-filtered-
table_B1_Int.qza' is not a QIIME 2 Artifact (.qza)

Nicholas_Bokulich · June 12, 2020, 9:32pm

Hi @gondry2,
I am not sure what that error is indicating and if it persists you should open a separate topic (since I think it's veering into a new issue from this topic)

But you should not need to re-run the core diversity metrics. Nothing about your data has changed, you are just adding a new metadata column to rearrange how pairwise-differences is handling your data in its tests. Theoretically, re-running core metrics should give you more or less the same result (slight variance is due to random subsampling at even depth, not due to the change in metadata).

gondry2 · June 12, 2020, 9:36pm

Ok thank you! I am checking in with our biocluster. I have been receiving odd errors all day. If it persists I will open a new topic.

gondry2 · June 16, 2020, 6:18pm

Hi there,

Sorry for the delay. We had some technical issues that are now solved locally. I tried creating a concatenated column as suggested and changed the --p-individual-id-column to reflect that. Unfortunately I end up with the same error code as listed earlier:

State 0.0 is not represented by any members of HighIron group in metadata. Consider using a different group_column or state value.

Any suggestions?

Nicholas_Bokulich · June 17, 2020, 5:06pm

Hi @gondry2,
Hm... I wonder if some formatting quirk in your metadata file is causing this issue. You could send me your metadata file and shannon_vector.qza via DM if you would like me to take a look.

Nicholas_Bokulich · June 18, 2020, 11:28pm

Thanks for sending me your data @gondry2!

I have found the issue. It looks like your alpha diversity data were calculated on a filtered feature table. More specifically, they were filtered so that B1_Int=Yes, and all samples in the metadata file that are B1_Int=Yes and Diet=HighIron also are State=1, so by filtering those samples you removed your baseline HighIron samples.

If using the filtered data is intentional here, one option to proceed is to create a new metadata file that has no HighIron samples in it, and use that for this analysis — I believe that will cause other samples in your alpha diversity results to be ignored, since the data are merged automatically by q2cli before inputting to q2-longitudinal...

I hope that helps!

gondry2 · June 18, 2020, 11:48pm

Thanks for the quick response! I'll give it a try.

gondry2 · June 19, 2020, 9:14pm

Just an update. I was able to run pairwise differences by creating a new metadata file with the shannon, evenness, and faith results from feature tables that only had the information from baseline 1 and one diet intervention (ie. High Iron). Unfortunately, I can't use this fix for the pairwise_distances, but I should be able to use a similar method and run the stats in R. Thanks again for all of your help!

Nicholas_Bokulich · June 20, 2020, 1:27am

The fix for that case will be to remove those samples from the feature table and then re-run core-metrics (or whatever you used to create the distance matrix). Or you could filter those samples from the distance matrix with qiime diversity filter-distance-matrix and re-run pairwise-distances with that.

gondry2 · June 20, 2020, 6:16pm

So I reran core metrics for each diet arm (baseline + low iron, baseline + high iron, baseline + meddiet), and have three distance matrices from this. Am I able to merge them into one matrix to run pairwise-distances? Is that what you meant for me to do? The problem with all of this is the one baseline. I tried running first distances setting the baseline and received and error for that as well. I really appreciate your help!

Nicholas_Bokulich · June 20, 2020, 10:31pm

Hi @gondry2,
I forgot the part about these samples sharing the same baseline, or maybe did not realize that you are trying to compare each to the original baseline, not the pre-intervention periods that are also labeled as "baseline". Comparing to baseline 1 at each time point will not be impossible to handle with pairwise-distances, but it does complicate matters. It probably would be easier to customize this analysis in R at this point, since it will take some contorting to do this test in pairwise-distances (which was designed with a simpler intervention design in mind!).

To pass into pairwise-distances you will need to (I think this will work but it sure is long):

filter your feature table to include only the baseline samples
use qiime feature-table group to rename the baseline samples for diet 2... add these to your metadata file with the appropriate metadata, but with DietDef labeled as diet 2
use qiime feature-table group to rename the baseline samples for diet 3... add these to your metadata file with the appropriate metadata, but with DietDef labeled as diet 3
merge these new tables with the original table (that also contains baseline samples)... modify the metadata so that these original samples have DietDef as diet 1.
run core-metrics
run pairwise-distances

Another (better) option, if you are handy with python, would be to make a new method in the q2-longitudinal plugin that is designed for cross-over experiments like this, and submit a pull-request to get it added to q2-longitudinal! Just saying

What error did you receive? First-distances might be a good way to handle this, if we can fix the error, since you could then pass that result to R...

gondry2 · June 21, 2020, 5:58pm

Yeah, I was thinking I may need to switch over to R at this point. I guess I have been hesitant because I have been feeling cozy QIIME2. R is a whole different beast.

I have been trying to write the script to relabel the baseline samples. I assume the --p-axis should be "sample". I am not entirely sure which metadata column I should be using for this script though. Would it be the sample id?

The first distances script produces a qza file, but then when I try to run a linear mixed effects I receive this error: Plugin error from longitudinal:

Linear model will not compute due to singular matrix error. This may occur if input variables correlate closely or exhibit zero variance. Please check your input variables. Removing potential covariates may resolve this issue.

Here is my code:

#!/bin/bash

----------------SLURM Parameters----------------

#SBATCH -p normal
#SBATCH -n 1
#SBATCH --mem=15g
#SBATCH -N 1

----------------Load Modules--------------------

module load QIIME2/2020.2

----------------Commands------------------------

qiime longitudinal first-distances
--i-distance-matrix unweighted_unifrac_distance_matrix.qza
--m-metadata-file Lisa_mapping_file_061620.tsv
--p-state-column DietNum
--p-individual-id-column SubjectID
--p-baseline 0
--p-replicate-handling random
--o-first-distances first-dist_unweighted_unifrac.qza

first-dist_unweighted_unifrac.qza (132.5 KB)

Totally understand if this is becoming more trouble than it is worth. I am still a python novice (though all of this is making me feel more acquainted), so developing a plugin is out for the time being.

gondry2 · June 22, 2020, 4:11pm

Just a quick update: It looks like your solution is working! I was able to run first-distances with the new merged sheet. Thanks so much for all of your help!

system · July 23, 2020, 10:14pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.