At the beginning of qiime2, I have set my data into two groups, and then do the data analysis (dada2, rarefy ...) respectively.
Now I'd like to compare the alpha and beta diversity of these two group. But it seems that I can't just merge the diversity data in one table and then do the analysis because in different group, I have set different metric in dada2, rarefy and so on.
So how can I do so that I can compare the diversity of these two group.
First of all, I think maybe it is better to run the tools (DADA2, diversity) on all the samples inside the same QZA, not individually for each group. ¿Why? Because, for example, diversity metrics are really dependent on the depth you rarefy. If you rarefy each group to a different depth, I'm afraid those metrics are not comparable. So I would run all QIIME 2 commands in all samples from the beggining.
Now, your question:
As you want to compare diversity, you first need to compute diversity metrics with diversity plugin (using core-metrics, or core-metrics-phylogenetic, or alpha, or beta, etc). Then, here you have how I would compare alpha and beta diversity measures using diversity plugin:
Alpha diversity: Let's suppose you want to compare e.g. Shannon indexes. You can use the alpha-group-significance. This applies Kruskal-Wallis test to compare groups of alpha diversity values. The command should look something like:
Beta diversity: When you compute beta diversity metrics, you don't obtain a vector of values per sample. Instead, you obtain a matrix of distances for each sample against each of the other samples. For this kind of data we can use permutation-based statistical tests. One example is PERMANOVA. QIIME 2 allows you to perform those tests with beta-group-significance. In these cases, you need to specify the metadata column you want to test. Imagine you computed a beta metric (e.g. Bray-Curtis), and now you want to test if beta diversity is different in your samples according to the metadata column "bodysite". The command would be something like:
qiime diversity beta-group-significance \
--i-distance-matrix bray_curtis_distance_matrix.qza \
--m-metadata-file metadata.tsv \
--m-metadata-column bodysite \
--p-method permanova \ # you can also use anosim or permdisp
--p-pairwise \
--o-visualization bray_curtis_significance.qzv
Please note this is only one way to do it, not necessarily the only one, let alone the best.
Cheers,
Sergio
--
Disclaimer: I'm only another forum user, just like you. Please don't take my answer as a ground truth. A Forum Moderator would probably provide you with a more accurate answer.
I have run the dada2 on all the samples inside the same QZA, but it seems that the frequency of each feature is really different in L1 and L2, so I try to seperate them in two group, and then run the dada2 respectively.
But now I want to compare the diversity of L1 and L2, so I am confusing about it..
I don't see a difference of feature frequences as a reason for running DADA2 separately for each group. Different feature frequences among samples is what you would expect in microbiome analyses. What are L1 and L2? I mean, which condition do they represent?
You can always manually merge the data of the QZA, but I still think performing all the analyses in all samples at once is the best way here.
The L1 and L2 is one of my different sample condition.
From the feature table, I think in L2, the feature frequency is higher.
Now, I want to change my analyze method.
First of all, I would like to analyse all the samples at once , and observed that is it there different diversity in different sample depth(L1,L2)?
And then I'd like to seperate the samples according to L1 and L2, like what I did before.
Using this method , I can get the same rarefy depth when I analyse the data. What do you think. But the parameters setting for analysis will be different, such as the trunc length in dada2.(eg. When I do the total analysis (step1), the length set as 225, and when I do the seperated analysis, the length set as 240 in L1, and 250 in L2)
I think maybe it's not a problem, because it just like I analyze at different aspects, and in each one, they have the same rarefy depth so that I can compare the diversity.
Can you understand what I am talking about? What do you think about it?
I know, but there should be a biological reason for you to want to analyze separately samples based in that condition. That's why I ask.
I understand the reasons, and I understand that as long as you rarefy to the same depth you are good to go. But still, I don't think running DADA2 separately changing e.g. truncation parameters would help here. I don't see what are you trying to correct doing that . I agree with you, it's not a problem to do it that way. But I don't see why we would prefer to do it that way.
I think we might be going a little bit off-topic (original post was on diversity). Perhaps a senior user could lend a hand? Both deciding to split the post or with the question itself.
At the beginning, I want to separate them because the frequency between L1 samples and L2 samples are different. And if I wanna do the rarefy by setting the sampling depth at the minimun frequency, it would lost lots of data. I found that the frequency of L2 sample is higher, and the data is about archaea, I think it makes sense, so I wanna to separate according to different sample depth.
I would try to seek help from senior users, thank you so much!
This is a great discussion. You are on the right track!
If and when to split your data are common questions! I'm glad you are thinking about this.
For upstream analysis, meaning raw fastq files into ASVs tables, DADA2 recommends processing one batch of samples for each amplicon region on each Illumina run. So if all your samples are on the same run and are all 16S V4, they should be processed together with DADA2. You can split them up later.
Two Illumina runs? Run DADA2 separately for each run.
Two amplicon targets (16S and 18S)? Run DADA2 separately for each region.
If this is needed, then it's probably best to do this during downstream analysis, meaning statistical testing and graphing.
If there is a biological reason for L1 and L2 to have different sampling depths, you can describe that in the paper and it should be okay.
If you want to compare L1 and L2 samples, normalization can help address differences in sampling depth. Normalization is a contentious topic, which means there are many options and you get to decide what's right for you!