To remove a sample from alpha/beta diversity files, remove from matrix or from metadata and count table?

Hi,
When running the alpha and beta diversity, I realised one of my samples should not be there (It has a different group to which I do not want to consider in my study, and the stats are considering it as a group to compare to).

In order to remove it and do not consider it for my diversity metrics.
A) Should I remove it from the distance matrix (ie: unweighted/jaccard/bray…?) using filter-distance-matrix

B) Should I remove form the initial metadata and OTU count tables, and re-run the distance matrices?

I am unsure if removing the sample from the distance matrix would leave the matrix as if I had run it without that sample?

Or distances need to be re-calculated without that sample again?

Thanks

Hi!
I think the best approach would be remove this sample from your feature table and rerun diversity metrics without this sample since beta diversity metrics (not sure about alpha) may be affected by this sample.

2 Likes

Thanks, it makes sense to re-do the whole analysis and stance matrices. Thanks

Hi @ecg and @timanix,

The observed alpha and beta diversity is a function of the features in a sample. So, if you filter features or re-normalize the data, you must re-calculate diversity. If you want to change the samples, you must filter before QIIME visualization. (Alpha/Beta group significance, Adonis, PCoA, longitudinal analysis). In general, I suggest calculating your diversity once, because it tends to be computationally expensive and potentially slow. Then, I filtered and do the visualization/testing (which is quick).

The alpha diversity is independent of other samples (within-sample diversity). So, you can filter the alpha diversity. I don’t think there’s a specific way to do this, so I might try passing your filtered metadata into an analysis and hopefully :crossed_fingers: it should just filtered down to the intersection of samples.

Beta diversity is a little bit more complicated.
The beta diversity is dependent on the pair of samples, but if I calculate distance on A vs B, that won’t affect A vs C. (Although A,B, and C are all constrained.) So, i you can just filter the sample out of the beta diversity distance matrix (try filter-distance-matrix).

You will need to re-calculate your pcoa, since pcoa is dependent on the samples. You can do that with the pcoa method and then pass in emperor’s plotting method. (These are the methods that sit underneath core metrics).

The one exception to this rule is that if you’re running DECOIDE of Gimilli, you need to re-calculate since those are ordinations and dependent on both the samples and features.

Best,
Justine

3 Likes

This is very useful. Thanks for the clarification

1 Like

Hi, @jwdebelius!
Thank you for wonderful clarification!
Now I have another question. I am applying permanova to distance matrix outside of Qiime2. Is it legal to filter samples from a matrix in data frame just by removing corresponding rows and columns?

1 Like

Hi @timanix,

That’s whats what I tend to do if I’m running batches of R in vegan: I filter the rows and columns to match my metadata order.

Best,
Justine

1 Like

Thank you for your reply!
Just replicated it in Python, results are consistent with qiime2 outputs. But better safe, than sorry!

1 Like