core-metrics-phylogenetic multiple times on different subsets of data but with overlapping samples

Dot · February 18, 2023, 4:01am

Hello!

I have a question about running core-metrics-phylogenetic multiple times on different subsets of my full dataset. In my full data I have two sample types, cultures and inoculum samples. I want to first create one PCOA plot that includes all of those samples together, so that I could visualize and test how different the two sample types are from one another. However, my primary interest is then to create another PCOA plot with just the culture samples because I am interested in doing a further in-depth analysis of just those samples. So I want to remove the inoculum samples from the distance matrix.

Is it appropriate to run core-metrics-phylogenetic twice, just switching out the table file and metadata file (keeping the sampling depth the same)? E.g. can I run both of the following commands?

qiime diversity core-metrics-phylogenetic --i-table table-cultures-and-inoculum.qza --i-phylogeny tree.qza --m-metadata-file cultures_and_inoculum.tsv --p-sampling-depth 17286 --output-dir core-metrics-results1

and

qiime diversity core-metrics-phylogenetic --i-table table-inoculum-only.qza --i-phylogeny tree.qza --m-metadata-file inoculum_only.tsv --p-sampling-depth 17286 --output-dir core-metrics-results2

I was concerned about doing this since I know rarefaction occurs by random subsampling, and therefore the results for both alpha and beta metrics may slightly change each time. However, I also know it's not appropriate to just simply remove the unwanted samples from the first distance matrix and PCOA plot because all the distances are calculated based off of those samples being there.

Any insights on how to go about this would be greatly appreciated, I just want to make sure I'm using this tool appropriately. Thanks so much!

crusher083 · February 18, 2023, 2:22pm

Hello,

I would rather go for merging both tables and plotting PCoA of all data points. Thanks to Emperor's interface, you can turn off visibility for certain group and isolate samples of interest.
You can also filter distance matrix for a downstream analysis.
The utilisation of the whole dataset is imo better solution, as there will be no need to do rarefaction second time and the samples from the whole dataset will be already included during dimensionality reduction.

Dot · February 18, 2023, 5:58pm

Thanks for your response! I was wondering about that option in Emperor, because I tried it and received this warning:
Screenshot 2023-02-18 at 12.55.07 PM

Is it still acceptable to display the samples like that for a publication?

Thanks!

crusher083 · February 18, 2023, 6:11pm

I think it depends more on the context of a figure, than the figure itself.
If your data interpretation doesn't change from removal of the points (i. e. doesn't direct readers attention to misleading conclusions), than it is absolutely legit way to do this.
But it should be done thoughtfully, indeed.

Dot · February 18, 2023, 6:41pm

Makes sense, thank you!

jwdebelius · February 20, 2023, 3:04pm

I'm sorry to be that person .

I'm a big proponent of running diversity once, and then filtering it. It's a better practice for big datasets, and it gives you lots of options for analysis.

However, in terms of hiding PCoAs, I have to disagree with @crusher083 (sorry!).

I've seen this abused in too many circumstances. There are a couple infamous cases where people used a multiple bodysite PCoA to drive a specific conclusion by hiding the second body site! (They also didn't communicate about the samples in the PCoA, which made it worse!) And, since you can filter a distance matrix (qiime diversity filter-distance-matrix), recalculate the PCoA (qiime diversity pcoa), and do the emperor plot (qiime emperor plot) quickly with very little memory, it seems disingenuous to hide the points. You have to work on a subset anyway to do your statistical tests, so if you're going to do that, you might as well plot the subset.

Best,
Justine

Dot · February 20, 2023, 3:42pm

Hi Justine,

Thank you for the follow-up! Actually, I think your solution here would work perfect for what I'm trying to do, as effectively all I wanted to do was to recalculate the PCoA just without certain samples:

filter a distance matrix (qiime diversity filter-distance-matrix ), recalculate the PCoA (qiime diversity pcoa ), and do the emperor plot (qiime emperor plot )

...I just didn't realize this was possible . Thanks so much.

(Apologies, I also just saw that you posted a similar response to a sort of related issue here that I somehow missed in my earlier searching on this topic!)

jwdebelius · February 20, 2023, 8:13pm

Hi @Dot,

I'm glad that solution will work for you!

There's a filter function for almost everything in qiime2. (I often check the filtering tutorial if I'm not sure if it can be filtered.) I think alpha diversity is the only major thing you can't filter easily.

I'll also recommend checking out the plugins if you think there's something you should be able to do, but can't. There are a handful of pipeline commands (core-diversity for example) that cover a bunch of steps; you can usually find the sub functions in the same plugin. I use the q2-diversity docs a lot to check for commands I need; there's way more going on than can be covered in a tutorial, and so it's nice to have extra back up.)

Best,
Justine

Dot · February 21, 2023, 4:27pm

Super helpful to know, I've mostly been going off of the tutorials and supplementing with other documentation, but will definitely specifically check out the plugins. Thank you!

system · March 24, 2023, 10:28pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.