Good analysese methods to address my research questions--Everyone is wellome to join the disucssion

moonlight · January 29, 2020, 10:10pm

Hello, I just want some general feedback and look for the my research questions. Also I want to find correct analyses in Qiime that can answer my research questions.

Background: I studies hosts/microbiome. The hosts can be a group of animals or plants that phylogenetic related. I have built the phylogentic tree for the hosts and waiting for my microbiome sequencing data (16S baceria and fungi). I also have some chemistry data related to hosts habitats environments.

Can anyone give me some feed back about the analyses/scripts that I should use. I have some ideas but I want to get some feed back.

1> I want to know if hosts have closed related in phylogeny would have close microbiome? Would native host species has quite different microbiome from invasive species? --- use Adonis?
2> I also want to look at some patterns between fungi and bacteria. For example, any correlations. If fungal group A is absent in host X, bacteria group B will appear
3> If phylogeny is not important, how can I test if environmental chemistry is important?
4> In order to answer this type of questions? what taxonomic level should I choose? If I use ASV, it will have so many 10,000+. Should I chose phylum level? which is basically added ASV in each phylum and get a large number and less categories?

If QIIME analyses, you are welcome to recommend other software such as R etc. or papers.

colinbrislawn · January 30, 2020, 6:11pm

Hello John,

These are all very good questions. I'm not sure I have all the answers, but I can tell you how I would approach these problems to get the conversation started.

Sure. Feature table -> distance matrix -> adonis(dist ~ host_phylogney)

You could summarize the change of the full community using procrustes-analysis. Using the ordination of the distance matrix from both fungi and bacteria, procrustes attempts to project them onto each other, but a close fit is only possible if their fundamental variation is similar.

This one is totally adonis. adonis(dist ~ ph + TOC + Iron3 + Iron2)

For stat tests, I always use ASVs, all of the time. I only use lower levels when I'm making graphs. A graph can't show 10k ASVs legebably... but you can make a distance matrix from all 10k!

For correlations I suppose you could use a lower level like Family or Genus, but for all the distance matrix / ordination / adonis stuff, I would strongly support ASVs.

Colin

P.S. There are lots of other ways to answer these questions. Let's see what other people suggest!

moonlight · January 31, 2020, 4:11am

Hi Colin,

Thanks for the feedback. It's really useful. Just couple of follow up questions.

In my case, I have more than 200 microbiome samples.

1>Do you normally present the taxonomic plot in your paper? I have done so many samples before. If have < 20 samples, I will do a nice stack bar plot. For this many samples, it won't looks clear or straight-forward. Any suggestions? PS, if I present taxonomic plots, I normally do it at phylum level. You can't do ASV level, right?

2>" Feature table -> distance matrix -> `adonis(dist ~ host_phylogney)", Here host phylogeny? I am not sure if I understand correctly. If I have 20 animals, you mean I have 20 groups (consider each animal as a categorical variable). Or, you mean I should use my host phylogenetic tree data? Hmm, I am not sure adonis supports the tree phylogeny?

3>Yes, I always use lowest level (ASV or OTU) for calculate beta distance matrices (e.g., unifrac, bray-curtis, etc). This is not a problem. My concern is adonis or other similar analysis would give you a lots of results. Most of them are p-value. This is kind of make my paper dry. I want to make some plots. What I can only think is ordination plots such as nMDS or PCoA. Any other ideas? Would network analyses fit in this type of analysis? -- To be honest, I don't know much about it. I read papers. It seems people use cyber network do a lot of things with different data (even metagenomic function data). It seems give you some of groups? However, the the lines between groups are so hard to tell the relations.

Any suggestions on network analysis or each host build a network?

4> "You could summarize the change of the full community using procrustes-analysis. Using the ordination of the distance matrix from both fungi and bacteria, procrustes attempts to project them onto each other, but a close fit is only possible if their fundamental variation is similar."

This is a good suggestion. I know procrustes-analysis and I am gonna do this actually. Hmm, any other way to easily identify when group A bacteria present, group B fungi disappear or co-occurrence?

Everyone is welcome to join the discussion

colinbrislawn · January 31, 2020, 5:31pm

I don't find the stacked bar plots useful, but they appear to be obligatory. My most cited and lead author paper both have them at the Phylum level, just like you suggested.

I meant the phylum of the host (which does sound like phylogeny, sorry!). So, how does microbiome composition depend on categorical phylum of the host? Your 20 animals might be divided into 3 ~~phyla~~ vertebrates, which you can test with adonis.

adonis( ~ vs vs )

Great!

Really? adonis(dist ~ variable) gives you the R-squared (effect size) and p-value, so two values per variable tested. You could put all these into a graph, if you wanted.

I think PCoA plots are a great choice as they look good and are easy to understand. If you see clustering inside of a PCoA, you can perform the adonis test and report the finding in the caption of the PCoA. This is a great way to present the data visually and quantitatively.

Basic network analysis is really similar to ordination... so I don't use it much. Once you have functional data from metagenomics, networks become much more interesting! An all-vs-all network often looks like spaghetti.

The final question:

This sounds a good fit for all-vs-all co-occurrence analysis, which you could totally put into a network. Let's see if @jwdebelius or @Mehrbod_Estaki have suggestions for finding microbial co-occurrence or partitioning.

If you are specifically interested in microbes that avoid each other, this might untangle the spaghetti. Only graphing strong, negative coucurance sounds interesting!

Keep up the good work!

Colin

jwdebelius · January 31, 2020, 8:52pm

Hi @colinbrislawn and @moonlight,

Thank for tagging me in on this one? It's a hard problem and although I have opinions (what's new?) I think Colin has done an absolutely awesome job at answering everything and (mostly) agree.

I actually hate this and have moved away from it! I agree, though, that it doesn't work at a larger sample size. I like (and have done) area plots for this sort of thing, but that's usually custom python code which at some point I will actually turn into a gist so I can re-use it instead of re-writing every time.

Personally, I like a stacked barchart as a "is this reasonable for my system". It's an easy visual diagnostic to see if, for example, something went wrong and my skin samples look like poo. (Or the other way around.) But in a paper? Eh. If its a novel ecosystem (maybe 0-5 papers) where you're establishing a benchmark of what the community "should" look like, I think its one thing. It's another fi you're already there.

Im once again going to vote smaller than phylum. But yeah, no, I think grouping ehre might be interesting. (sorry @colinbrislawn, this i sone of my points where I get pedantic: , , , , and are all the same phylum!)

Another potentially cool option might be a procrustes where maybe you compare the hosts by some measure fo evolutionary distance and look at the relationship with the microbes based on one of your favorite microbial distance metrics. And then maybe consider mantel to compare? (Might not work, might be fun? Who knows???)

edit: one more thought is that if you see a relationship, it might be cool to use the host phylogenetic tree with Gneiss to cluster the features. I would only do this if you see something in diversity, but again, potentially interesting?

I loved that so much in this paper that I did something similar in a recent analysis of mine. Plus, adonis lets you adjust, so... if you find multiple things you can build a multivariate model! (Just make sure the confounders come first!)

I think this kind of depends on luck and where you're looking! I wouldn't go for network without strong evidence that you've got pattern. To me, that would again maybe be a procrustes, comparing ordination between bacteria and fungi. Then, I might move into network analysis? I think filtering will be key here! I like a joint abundance/prevalence approach Ive talked about implementing here where a feature has to be at a certain abundance in a certain number of samples to be kept, but you could also probably do a seperate abundance and prevalence filter, although they're related. (I tend to get good results at present in about 10% of my samples, but I also start to get happy around 250+ samples so YMMV with 10%.)

I've had good luck with q2-SCNIC recently, more with the SparCC implementation than the actual SCNIC algorithm, so maybe look at that? The checkerboard score is an oldie but maybe a goodie. It's semi controversial and it's not implemented in qiime2, but it may be helpful in telling about partitioning. I'd also check out the related literature on scholar; its been maybe a year since I really went through it, but that literature may also be good.
The place I caution you here, though, is to consider what "present" and "absent" mean in your samples: are the zeros measurement error or actually there? And, how do you want to work with that difference in your compositional data.

It's a hard problem and

Best,
Justine

Mehrbod_Estaki · January 31, 2020, 10:49pm

Thanks for the tag @colinbrislawn, you and @jwdebelius have said it all, I don't really have anything to add this but just to caution against trying 'everything' until you find something in your data. Instead, try and come up with some decent prioris based on the literature before you dive into your data and focus on those. Much easier to test too.

moonlight · February 1, 2020, 12:39am

Hi Colin,

Thanks for feedback.

1> the figures in the ISME and nature are amazing. what kind of software did you use to plot them? ggplot2? or othersoftware? Any open source scripts that we could use online?

2> Do you mean the stackplot is not straigt-forward if we have a lot of samples, but it still the method that most of people would choose?

3>" My concern is adonis or other similar analysis would give you a lots of results.Really? adonis(dist ~ variable) gives you the R-squared (effect size) and p-value, so two values per variable tested. You could put all these into a graph, if you wanted." -- Sorry, I didn't explain well here. Yes, I actually want to say it only gives us "dry" R-squared value and P value? How can I plot this?

I run adonis for example microbial ~ phylogeny, but it will only give me P value and R squared value. This would tell me whether the mcirobiome will vary across among different host?

You only have one P value and one R2 value? how can you plot this?

colinbrislawn · February 2, 2020, 11:53pm

Thanks! I hardly helped with the nature paper, but I made most of the ISME J figures using ggplot2 in R. I used cowplot to make multi-panel figures, and and the tidy package to get stat results ready to graph.

All the code to make those graphs is here: brislawn-2018-founders-species/analysis/FoundersSpeciesSup.Rmd at master · pnnl/brislawn-2018-founders-species · GitHub

Yep, they are both hard to read and yet remain popular. Like, both my ISME J paper and the much larger Nature paper are essentially about quantitative ecology (and explore nestedness at the mm and km scale). But I guess traditional microbiologist like seeing the taxonomy bar plot.

So just give them a bar chart

Wait! Hold on!

If you are testing your fundamental question, these "dry" values would be the primary finding of your paper! If that's boring... then why are you doing the study?

If you are testing 20 different chemicals and seeing which one could explain the most variance in microbial community structure, then 20 p values could get dry. Then is when I would make a graph.

You could summarize your tests like this:

data.frame(
  chemical = c(),
  p.value = c(),
  r.squared = c()) %>%
ggplot(aes(x = r.squared, y = p.value, label = chemical) +
  geom_point() + 
  geom_label() +
  geom_hline(yintercept = 0.05, color = "blue")

This makes an X, Y scatter plot of your adonis results.

If you want to know if two specific hosts differ, try this plugin:
https://docs.qiime2.org/2019.10/plugins/available/diversity/beta-group-significance/

Just report it directly. See this section of my ISME paper:

However, within the bacterial samples, more than 50% of the variance observed from nestedness could be attributed to sampling day (Fig. 3). In comparison, only 20% of turnover and 34% of total Jaccard could be attributed to differences observed between sampling days.

These are the R-squared values from 3 adonis tests:
adonis(dist.nestedness ~ sampling.day)
adonis(dist.turnover ~ sampling.day)
adonis(dist.Jaccard ~ sampling.day)

Colin

moonlight · February 11, 2020, 1:32am

"I actually hate this and have moved away from it! I agree, though, that it doesn’t work at a larger sample size. I like (and have done) area plots for this sort of thing, but that’s usually custom python code which at some point I will actually turn into a gist so I can re-use it instead of re-writing every time."

Hi Justine, do you plot area chart using Qiime2 script? or use other software?

I'm wondering if QIIME2 still supports area chart? I know QIIME1 does.

If you type "qiime taxa --help", I only see one choice which is barplot?

jwdebelius · February 11, 2020, 9:36am

Hi @moonlight,

I plot them with some custom python code I wrote for my projects. They're usually a bit more involved (sample nesting, color labeling) and i like the control I get from doing them myself. But, there's no reason you can't export the taxonomy tables, etc and just create them yourself using R, python, excel, tableau... whatever your favorite plotting program is.

Best,
Justine