alpha and beta diversity comparison in subgroups

ranxx005 · February 4, 2020, 6:53pm

For metadata, I have multiple categories. like moisture level (dry, wet) and location (MN, IL). In statistics part, the tutorial only showed alpha and beta diversity comparison over one category. For example, compare all dry (MN+IL) vs all wet (MN+IL), compare all MN(all dry+wet) vs all IL (dry+wet). what if I would like to compare dry-MN vs dry-IL? option 1) do I need to filter the rep_seqs with the samples I would like to compare, Generate a new tree for phylogenetic diversity analyses, and rerun and calculate the core metrics? option2) could I extract the core metrics from the results (containing core metrics for all samples) using "qiime diversity filter-distance-matrix" ? option3) I could generate another metadata file, combine moisture level and location into one category (dry-MN, dry-IL, wet-MN, wet-IL), and then do pairwise comparison? Thank you!

Mehrbod_Estaki · February 4, 2020, 11:18pm

Hi @ranxx005,
There are a lot of different ways of accomplishing what you are trying to do. Many which you have already guessed. You can create a new feature-table that has only the samples you want to compare. You can filter your distance matrix to retain only the samples you are interested in (this would only be uesful for the beta diversity tests). You can generate a new metadata file to contain only samples you want. All of those options would work. You wouldn't need to recreate a new rep-seqs file or phylogenetic tree as the extra sequences and branches would just be ignored. One thing I've been doing in these situations is to simply add a new column to my existing metadata file which only includes the samples I am interested in comparing and leaving the rest blank. In most of the tests this will work fine enough and the blank samples will be ignored appropriately. I say 'most' of the tests because there are some (can't remember which one exactly) that can't handle blanks properly.

ranxx005 · February 5, 2020, 4:36pm

Thank you for the answer. For Unifrac distance matrix and faith PD distance matrix, both of them are incorporating phylogenetic distances between observed organisms in the computation. If I compare these two distance among subgroup, I do not need to generate a new tree? Are the trees generated from only samples (subgrouped and I am interested to compare) and the trees from all samples different? Are the distances different then?

Mehrbod_Estaki · February 5, 2020, 10:42pm

Hi @ranxx005,

tldr; you don't need to rebuild your tree. Extra branches are fine to have.

Let's say you have a feature-table of 10 samples, and a rep-seqs file of 100 features (ASVs). You use the rep-seqs file to create a phylogenetic tree that incorporate those 100 ASVs.
Now, you find out that you need to remove 2 of those samples because they were from the wrong experiment, and about 20 features in your table were unique to those 2 samples.
Option 1: You remove those 2 samples from your table, then use that table to filter your rep-seqs file. Now your rep-seqs file has 80 ASVs. You build a new tree that only has 80 branches in it.

Option 2: You remove those 2 samples from your table, but don't change your rep-seqs file and tree. Now your tree still has 20 branches that are not represented in your table.

When you are working with Qiime2 in downstream analyses that require phylogeny, like UniFrac or Faith's PD, those branches are called on a per-need basis. So, it doesn't make a difference if there are some extra branches in your tree, since they will never be called. Now this only works when you are removing samples, if you were to add samples to your analyses then you would indeed want to rebuild your tree.
Hope this makes sense.

system · March 8, 2020, 4:42am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.