Mistake in metadata file and re-running core metrics analysis.

Hi @xchromosome,

It sounds like a lot, but let's work through!

Let me start by giving some general feedback that hopefully will set your mind (and computation) at ease. Taxonomic assignments, trees, and tables don't need to be re-built after filtering for features. You can pass a tree that's a superset of the features it your table. You should be able to have names for sequences you don't have in your label (and if Im wrong and this is screwy, you can just filter rather than having to rebuild.)

Ouch! This happens. Its always hard, but its good to catch it now, if you can. So, yay for finding it. However, it doesn't invalidate your analysis.

Your tables, tree, and taxonomy should be metadata agnostic. You can (although I personally discourage it) generate feature tables, trees, taxonomic assignment, and even diversity without ever having to know anything about a sample but the name and barcode. Of course, once you get to that point, you're kind of SOL, but you can get there.

Good that you re-ran your diversity analyses; these definitely need to be re-done. You do not need to re-run your feature classification, though. You already filtered out the taxa you don't want. At worst, you'll need to filter your feature data. If you'd filtered your sequences and then gone in and done de novo OTU clustering, then you would need to re-do your classification, but since you filtered the table, there's no need. (You also don't need to re-build your tree, the algorithms will just prune it for you.)

So, it looks like this step is good.

This is a place I think breaking out of core diversity may help you. I would both try adding the extra column and separating by treatment. You don't have to re-calculate the distance matrix fi you're not changing the underlying feature table (features included, rarefaction), you can just use qiime diversity filter-table. Because calculating distance matrices take a long time and tend to be computationally intense, in my own analysis, I work really hard to only calculate a distance matrix once and then just filter it for whatever I need.

Once you've got your filtered distance matrix, you will need to calculate a new PCoA and run new statisticis (permanova, etc). PCoA is a projection based on the distances in your dataset, and the addition of a point can shift your PCoA. So, that needs to be re-done every time. Luckily, it's not terribly computationally intense and it's pretty quick. (Check out qiime diversity pcoa. And, if you discover another issue with your metadata, you can actually just update the emperor plot for the same PCoA using the new metadata.

I would also suggest looking at q2-longitudinal because I think if you've got paired samples, you should use them! Paired samples may help decrease some of the noise and give you all sorts of shiny statistical benefits (like breaking some of the obnoxious properties of distance matrices, for instance).

So, as a wrap up:

  1. Mistakes happen, you found yours and fixed them, yay!
  2. You don't need to re-do taxonomic classification or build a new tree unless you're filtering before your ASV table.
  3. You don't need to calculate a new distance matrix unless you're changing your feature table (like filtering features) or rarefaction depth
  4. You do need to do a new PCoA if you change your sample set. You also need to do new statistical tests.
  5. You pick up power with your paired samples, use them!

Best,
Justine

7 Likes