Pairwise-differences test returns weird value

AllieNguyen · August 9, 2018, 4:51pm

Dear Qiime2 developers,

My name is Quynh Anh, you can call me Allie. I am a newbie to QIIME2 and I am running through the FMT tutorial, trying to do the longitudinal test. I have filtered the donor samples from the feature table before running diversity analyses (sampling depth = 152).

To answer question 1d: Do richness, evenness, composition, and UniFrac distance change in individuals between baseline and the end of the study? Does this differ between individuals receiving FMT and control subjects?

I remove the donor samples from the original sample metadata file then performance pairwise differences test with the following code:

qiime longitudinal pairwise-differences \

--m-metadata-file sample-metadata-213.tsv
--m-metadata-file core-metrics-results/faith_pd_vector.qza
--p-metric faith_pd
--p-group-column treatment-group
--p-state-column week
--p-state-1 0
--p-state-2 18
--p-individual-id-column subject-id
--p-replicate-handling random
--o-visualization pairwise-differences.qzv

The test returns this file:pairwise-differences-faith-pd.qzv (284.1 KB)
with W = 0 for the control group. Also, the number of samples is very small, so I feel confused.

I also tried to run the pairwise-distance test on Bray-Curtis and unweighted UniFrac distance. The test also returns Mann-Whitney U value of zero and the number of samples tested even smaller.
Bray-Curtis: pairwise-distances-bray-curtis.qzv (276.5 KB)
Unweighted UniFrac: pairwise-distances-unweighted-unifrac.qzv (277.9 KB)

I would be grateful if you could help me with this!
Thank you,
Allie.

Nicholas_Bokulich · August 9, 2018, 8:10pm

Hello @AllieNguyen,

Welcome! you came to the right place.

The small sample size explains the low W and confusing results. Are you using the 1% or 10% subsample? Using the 10% should retain a few more samples and the results should be at least a bit more sensible.

A few things to consider:

the questions at the bottom of the tutorial are really just suggestions, to get folks interested in learning about other methods available in :qiime2: (which may be more useful on their own full-size datasets)
so there is no "right" answer.
I must admit I did not test those suggestions on the subsampled datasets, only the full-size dataset for which there are more samples.

The small datasets provided in the tutorials are usually there to get answers fast, rather than to get "good" results. E.g., the subsampled FMT data and moving pictures tutorial dataset are very small; smaller than would be appropriate/useful for many of the tests presented in some of the tutorials! But small enough that they can demonstrate a typical workflow in minutes rather than hours or even days (as a large dataset may take).

In light of all of this, I think there are two conclusions:

your results are not actually abnormal — the input data just aren't large enough!
it could be that baseline and end of study are not good comparisons on the subsampled data — are there other time points that have more samples that are interesting to compare?

We are always very open to suggestions and contributions — if you want to explore the dataset a little more and suggest other timepoints or analyses that are more interesting in the context of this tutorial, we could love to have you contribute these to improve the documentation!

Please let me know if that makes sense! And please let me know if you are interested in helping to improve this.

Thanks!

AllieNguyen · August 10, 2018, 12:10pm

Hi Nick!.

Thanks for the explanation. I was working on the 1% subsample, will have a go on the 10% to see how it goes.
In terms of other time points with more samples to compare, I check the table.qzv and week 10 seems to be a potential option. Week 10 can be considered as the middle of the experimental period, then we can compare week 0 (baseline) vs week 10 and week 10 vs week 18 (end of the study). I'm not sure if these comparisons make much meaning in practical, but the values returned in these tests (W and Mann-Whitney U) are not zero, so I think it would not confuse the new learners. What do you think ?

AllieNguyen · August 10, 2018, 12:10pm

Hi Nick,
I have some more questions regarding the FMT tutorial.

A. To answer this question 'Is the microbial composition of stool and swab samples significantly different based on either unweighted UniFrac or Bray-Curtis distances between samples?'. For this one I think there are two approaches, please correct me if I get it wrong:

If we only care that a sample is 'stool' or 'swab' then a beta-group-significance test could be done and the results show that the composition is significantly different: bray-curtis-sample-type-significance.qzv (372.9 KB)
However, if we take the fact that these samples come from different individuals receiving different treatments into account, a pairwise-distances test should be done the, and the results show that the composition is not significantly different: paiwise-distances-bray-curtis-sample-collection.qzv (277.2 KB)

So which one would be the right approaches here?

B. To answer this question 'd. Does community richness differ between stool samples and swab samples?'
It's quite the same thing as part A. Here we can do alpha-group-significance (then view it with column sample-type) or pairwise-differences test with (state-column sample-type state-1 stool state-2 swab). Both results show that the richness is not significantly different, but which one would be more trustworthy?
pairwise-difference-faith-sample-collection-method.qzv (284.3 KB)
faith-pd-group-significance.qzv (344.8 KB)

Thanks

Nicholas_Bokulich · August 10, 2018, 5:57pm

Hi @AllieNguyen,
Thanks for digging into this more! These are all excellent points and excellent questions!

That sounds great. I agree, these comparisons may or may not make biological sense but are probably better suggestions than baseline vs. end of study.

I have raised an issue on our issue tracker to modify these questions to make them less confusing. If you are familiar with working with github (or would like to give it a try), it would be great if you wanted to contribute by tackling this issue (i.e., so that you would get credit for it! ) and I would be very happy to point you to the relevant files and review your changes. If not, I can make these changes — just let me know how you'd like to proceed.

You are absolutely correct! Great inference on this. (In general there are always more than one answer of course, and these questions are sort of designed to get people thinking about different analysis options — we use these in workshops and classes using QIIME)

Neither is the "right" answer — they are both valid approaches and effectively ask two different questions.

Are stool vs. swab so different that they can be distinguished irrespective of which subject they came from? (if so, swabs aren't too useful! That would probably indicate major sampling bias or contaminants) Good thing there is not a difference
The paired distances approach is more sensitive and, combined with approach 1, determines whether intra-individual distances reveal differences between sampling methods (intra-individual differences are very useful in human studies, where inter-individual differences are sometimes larger than treatment effects! This is one place where the paired tests in q2-longitudinal can be very useful, whether pairing samples across time or sampling protocols). A significant difference here would indicate that there is a sampling bias/contaminant association with swabs, but one that is much more subtle (since test #1 is insignificant)... fortunately this test is not significant either!

Yes! Indeed... and it's effectively the same interpretation as in A. Both tests can be used and complement each other — as in A, together they assess the magnitude of the impact that a treatment/time/sampling method has (in this case on alpha diversity). Here again both are insignificant so swabs seem like a valid sampling method.

If you want a better tutorial dataset for digging deeper and trying out other analyses, check out the ECAM dataset used in the q2-longitudinal tutorial. That's a full-size dataset that will give you more room to experiment with different methods in :qiime2:.

Great questions! Keep on :qiime2:ing.

system · September 10, 2018, 11:57pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.