Mistake in metadata file and re-running core metrics analysis.

jwdebelius · May 30, 2019, 2:19pm

I'm just going to preface this with "maybe you want to get a cup of coffee " because it seems like a long answer as I write it. Im also going to apologise for my spelling mistakes in advance, because I know I will not see all of them until the post gets locked in like a month.

So, first, if I've made a typo related to treatment or timepoint, I apologize. This depends on your hypothesis, so I'll leave that judgement to you.

I also think it might help if you understood a little bit more about core diversity analysis, and what's happening under the hood. Here's what it's doing:

Performs rarefaction (qiime feature-table rarefy).
This outputs core-metrics-results/rarefied_table.qza
For each metric in alpha diversity, calculate alpha diversity using the rarified table .
- Applying qiime diversity alpha-diversity outputs
  - core-metrics-results/shannon_vector.qza
  - core-metrics-results/observed_otus_vector.qza
  - core-metrics-results/evenness_vector.qza
- And then you use qiime diversity alpha-phylogenetic to get
  - core-metrics-results/faith_pd_vector.qza`
For each metric in beta diversity
1. calculate distance (beta diversity) using the rarefied table
  - qiime diversity beta gives you
    - core-metrics-results/bray_curtis_distance_matrix.qza
    - core-metrics-results/jaccard_distance_matrix.qza
  - qiime diversity beta-phylogenetic
    - core-metrics-results/unweighted_unifrac_distance_matrix.qza
    - core-metrics-results/weighted_unifrac_distance_matrix.qza
2. Perform PCoA analysis to, via the magic of ordination, compress the data into a semi-human viewable 3D space qiime diversity poca (The math under the hood is still kinda magic to me )
  - core-metrics-results/bray_curtis_pcoa_results.qza
  - core-metrics-results/jaccard_pcoa_results.qza
  - core-metrics-results/weighted_unifrac_pcoa_results.qza
  - core-metrics-results/unweighted_unifrac_pcoa_results.qza
3. Use Emperor (qiime emperor plot) to generate a PCoA visualization that you can view.
  - core-metrics-results/bray_curtis_pcoa_results.qzv
  - core-metrics-results/jaccard_pcoa_results.qzv
  - core-metrics-results/weighted_unifrac_pcoa_results.qzv
  - core-metrics-results/unweighted_unifrac_pcoa_results.qzv

What that means is that you can stop at any point in the process and work with the table on its own. You don't need to re-run core diversity from scratch each time.

If you check the help documentation for each command, you might find that the only command in the core-metrics workflow that requires metadata is qiime emperor plot. So, if you've re-done your metadata, you can just run qiime emperor plot on each of your four metrics, and get new PCoA visualizations. You don't have to run the command each time as long as you're working with the same PCoA.

So, beta diversity is a between sample comparison. For each pair of samples, we can measure similarity in a number of ways. Except that for math-y reasons, usually we work in disimilarity, which is (1-simimilarity). Even more mathematically cool, most of the ways we measure dissimilarity actually qualify for the properties of distances (the wikipedia page is a good primer, but its not entirely relevant here, just trust me on this?). So, the distance matrix just the collection of all the pairwise distances between samples.

We can use this representation for a bunch of things. We can do statistical tests directly on the distance (permanova, permdisp, adonis, etc). But, it can also be hard to visualize distance. Heat maps and clustering can be helpful, but whats sometimes really nice is to take all the complex data and just... project it into 3D space to look for seperations and gradients. That's where we get our PCoA.

One good way I like to think about this is that the distances are the little notation in my atlas that tells me how many miles between cities and the PCoA is like a map . The problem here is that, unlike a map of physical locations, the way we do the projection into PCoA space doesn't have fixed reference points (like North, South, East and West), so what we see in space is some sort of transformation of the data, but can change based on what we choose to show.

With regard to filtering, cause that can also be confusing...

If you filter samples, you're not changing your features. Some features may be 0 in the new table, because they weren't present to begin with, but it doesn't change the per-sample count.
If Im comparing my book collection with my siblings and somehow, magically, none of us manage to buy more before our comparison ends, the books I have that are different from my sister aren't affected by whether or not I'm also comparing books with my brother. And, if he has a book neither of us have, then there is no difference. The features in your table aren't changing if you remove samples, just like the books in my library don't change if only my sister and I compare books.

I can also filter my distance matrix, or my between-sample comparison. You can filter this distance matrix to get a subset of distances. It's the same as if I only look at a distance table for California. It doesn't affect the cities the distance between a city in Cali and one in Arizona , it just means we don't see Arizona.

Okay, so I'm guessing you meant qiime feature-table filter-filter-features, but Im not toally sure, so I just pulled the help documentation (god, I love help documentation, I cant do an analysis without it) for both commands.

Usage: qiime feature-table filter-samples [OPTIONS]

  Filter samples from table based on frequency and/or metadata. Any features
  with a frequency of zero after sample filtering will also be removed. See
  the filtering tutorial on https://docs.qiime2.org for additional details.

and

Usage: qiime feature-table filter-features [OPTIONS]

  Filter features from table based on frequency and/or metadata. Any samples
  with a frequency of zero after feature filtering will also be removed. See
  the filtering tutorial on https://docs.qiime2.org for additional details.

So, qiime feature-table filter-samples removes samples. And then, if something is all 0s after we kick out the samples, it removes that feature. But, that doesn't matter for our diversity calculations, because it was all 0s anyway and the distance between 0 and 0 is 0. This is my sister and I kicking our brother out of a discussion about libraries .

And then, qiime feature-table filter-features will remove features (ASVs, OTUs, KEGGs, books from my sister's libraries, etc) from the table. Maybe you want to remove low frequency features (present in less than 10% of your samples) or maybe you want ot focus on only one taxa type (my sister just texted and said we could only compare books about dragons ). However, if after filtering, there are no more features, the sample name doesn't get retained in the table.

Try checking the usage for qiime diversity filter-distance-matrix and see if you can use that to figure out how it works.

I, erm, may have a bad history of, umm, burning through laptops by running computational intensive things on them . It also becomes more of a problem when you work on large datasets. It may not matter right now if you have 100-200 samples. But, if you've got 1000 or 10,000, the amount of time it takes ot do the distance calculation increases for each pair (because they're pairwise comparisons) and so suddenly, it takes really long to re-run. So, its a good habit to get into, I think.

Hopefully this helps some, but please, let me know if you've got more questions!

Best,
Justine