Question about qiime longitudinal pairwise-differences

Jia · February 28, 2019, 1:07am

Dear User Support,

I am trying to use Mann-Whitney U test through qiime longitudinal pairwise-differences to compare two sample types within each individual (Individual_ID), repeated by 18 individuals. I set Individual_ID column as group-column, since
--p-group-column: Metadata column on which to separate groups for comparison [required]:

qiime longitudinal pairwise-differences
--m-metadata-file MetaData.tsv
--m-metadata-file core-metrics-results/shannon_vector.qza
--p-metric shannon
--p-group-column Individual_ID
--p-state-column SampleTypes
--p-state-1 cloacal
--p-state-2 large_intestine
--p-individual-id-column SUL
--p-no-parametric
--o-visualization pairwise-differences_Mann_whitney.qzv

But I got the following error message,

Plugin error from longitudinal:

State cloacal is not represented by any members of nan group in metadata. Consider using a different group_column or state value.

Do you have any ideas on this error?

Thanks,

Jia

jwdebelius · February 28, 2019, 8:40am

Hi @Jia,

It looks like there's a problem with the way your metadata is encoded. Are you, by chance, missing information for some of your samples? The error message says that you've got a "nan" group, which usually means missing data. Since Im guessing your SampleTypes column should be defined for everything you want to pair, I'd try two things.

First, filter out any blanks you might have remaining in your dataset and make a sample only table. You may also want to remove any individuals who don't have paired samples.

If that doesn't work, import your data into Excel as text via the import mechanism, and have everything come in as strings. Then, delete several lines at the end of your mapping file. Stupid, but somehow its worked in the past.

I hope this helps!

Best,
Justine

Jia · February 28, 2019, 9:24am

Hi, Justine

Thanks for the reply. It works after remove the blanks.

But my results looked very wired. The values of W (wilcoxon signed-rank test) are all 0.0, p value/FDR p value are all 0.317311; the values of Mann-Whitney U are all 0.0 or 1.0, p value/FDR p value are all 1.0. Will it be the issue of the input data?

Thanks,

Jia

jwdebelius · February 28, 2019, 3:30pm

Hi @Jia,

Im not sure if the issue you have is data, or conceptual. Can you outline the hypothesis you're trying to test with this?

Best,
Justine

Jia · March 1, 2019, 7:13am

Hi, Justine

There is one large intestine sample and one cloacal sample from each individual and 18 individuals in total. I want to see whether microbial composition of cloacal samples are very similar to the ones of large intestine. This is what I was trying to do with this command.

Thanks,

Jia

jwdebelius · March 1, 2019, 11:17am

Hi @Jia,

So, then, I think your comparison isn't what you want. The reason you're getting all single p-values is because your group sizes are either too small, or you don't have a firm point of comparison. Your problem sounds more like a question for beta diversity, where you might look at something like whether within-individual pairs more similar to each other than the between individual variation at each site. Im not sure you have the capacity to set those reference groups in QIIME; there might be something in R, but you might also be looking at some custom manipulation of your data. (I think I'd do a permutative ANOVA, but picking out the specific groups. You might be able to manage something like that in R or Python, Im not sure about other statistical packages.)

I dont think the alpha diversity comparison, as you proposed, will answer that question specifically. With alpha diversity, you're asking if there's a relationship between the number of organisms in each pair. With that perspective, you might want to look at the correlation between the pairs, since you don't have a comparison group.

Im not sure you can/want to do this at a feature-based level, or how you'd structure that test, unfortunately.

My best bet for you here would probably be to pull your data out and work in a framework that gives you a bit more control, like R or Python.

Best,
Justine

Jia · March 3, 2019, 7:09am

Hi, Justine

Thanks for the replies. They are very helpful. Yeah, I am still trying to figure out the statistics for the question I want to answer. I was intentionally to compare beta diversity, but the inputing files need more works. So I used alpha diversity output as input files as a quick test because I was not sure whether the pairwise-differences indeed answering my question. I will have a try in R packages, like phyloseq or vegan.

Thanks very much,

Jia

Nicholas_Bokulich · March 3, 2019, 2:01pm

Two options:

use pairwise-distances on a PCoA coordinates artifact. This will test whether sample type significantly impacts sample ordination, aka whether composition is affected (depending on which beta diversity metric you use).
Just run a permanova/adonis test to see if your distance matrix partitions based on sample type (with adonis you could also do something like SampleTypes+Individual_ID).

QIIME 2 does not have such a function for a reason: it is difficult to work out a valid test. Otherwise qiime longitudinal pairwise-distances would have it implemented! QIIME 1 simply compared within- vs. between-group distances I believe using a t-test or mann-whitney U, but this comparison actually breaks the assumptions of various tests (including permutational tests): mainly that the samples are NOT independent. So yes you could do something externally in R or python but you should consult with a statistician first to find a method that is actually appropriate for testing this hypothesis with non-independent distance data.

jwdebelius · March 4, 2019, 9:24am

Distance tests break the assumption of independence in general, its one of the reasons that we use permutative testing. But, this is a case where a pairwise t-test with set reference groups that dont overlap would make sense. I don't think its univerally the answer, but in this case, there's a clear motivation and hypothesis. The distribution of samples is within individuals vs within sites; the actual distances that are used don't overlap so you're not replicating the sample in multiple locations. Therefore the independence assumption isn't broken here any more than in permanova, adonis, or mantel. But, based on group size and sample size, would a better direct test of the hypothesis and doesn't have the drawbacks the come with trying to do analysis on categorical data with small group sizes.

Nicholas_Bokulich · March 4, 2019, 1:51pm

I agree, this is fine for comparisons of within-group distances (and so is performed in the pairwise-distances action). It is comparing to between-group distances that concerns me (though admittedly I have used the same test to compare those distances in the past many times as well ).

jwdebelius · March 5, 2019, 9:46am

@Nicholas_Bokulich,

I guess it's all about perspective ! And, I think most of us have done slightly questionable things in the past. I will admit to a dark history of kruskal-wallis testing.

system · April 5, 2019, 3:46pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.