how to do the alpha and beta diversity and microbiota composition analysis ?

Yes, but if you know better how to do it with other tools, you can use them. I am using seaborn and matplotlib just because I can use it. But one can plot in R or any other language/software, whatever is more convinient

Thanks very much for your help.

Hello timanix, I have done the taxonomic analysis and got the graph as follows:

But I need the graph like this microbiota composition analysis, could you help me to how to do to get the following graph? I am really appreciated with you.

Hi! The graph on your screenshot is plotted outside of Qiime2 environment. You can use data from Qiime2 output (barplot.qzv file), but you will not be able to receive plots like this using only Qiime2. Authors of the article used R, Python or something else to draw this figure. I took a closer look on the figure and understood that it is more difficult to reconstruct then I thought for the first time. To recreate something like this with Qiime outputs, you will need to perform a lot of scripting.
So, if you can code in Python or other language, you can try. If not, I would advise or simplify the figure for your research, or find a person who can draw this figure for you.
By the way, in your case, you will have ASVs instead of OTUs until you performed 97% clustering with vsearch on your tables.

As a variant you should read materials and methods section of the article with this figure and try to figure out which tools they used. They can provide scripts or software they used, so it will be easier to reconstruct

1 Like

Thanks for you kind reply. I still have two questions:

  1. Which dada can generate the FIGURE 4 above the screenshot? How to find the data? From barplot.qzv by qiime2?
  2. I am a newer for microbiota analysis. And I am confused ASV and OTU. What is the difference between them? According to Moving Pictures tutorial, I got the graph like this:

But I looked for the publicated papers, the picture for microbiota composition analysis was like the following :

How to choose the style for the picture for microbiota composition analysis? Thanks for your help very much.

Hi @terren,

As @timanix has so accurately explained,

I will add to Timanix’s excellect adivce, and I can tell you based on the plotting style that it’s a ggplot figure, perhaps there’s someone at your institution who uses R who could help you make something similar. But, in any case, the figure was generated outside of qiime2. Some people like to publish with their qiime2 figures, others will generate them outside the pipeline. Its a personal, stylistic choice based on your coding ability and what you like aesthetically. If R isn’t your cup of tea and you’re not happy with the qiime graphs, you might consider another plotting software you’re more comfortable with.

You might want to check out some previous threads that have discussed this topic.


1 Like

Can I just say that it warms my heart to read this? :medal_sports: Best thing Ive seeen th is morning!


Thanks for you explaining patiently. I have known a little about the difference between OTU and ASV. But I am still stuck for how to analyze the dominated species or families in different groups?How to get the quantitiy of species or familes? From which data?
Thanks again for your kind help.

First of all you should decide which tables you want. If ASVs, you can proceed with your tables. If with OTUs - cluster your tables with vsearch with 97% similarity
Next, from your screenshot, I can see, that you have 2 groups of data, and if I understood correctly, you want a figure as fig.4 for ‘yes’ and ‘no’ groups. But in your barplot, you have data for each sample, not group. So you need to group your table, in which you will have only two grouped samples, like in figure 4.
When you will receive a barplot for this grouped tables, you will be able to see which taxonomy is dominating in your groups on any level you choose.
Now you will need to obtain relative abundances of ASVs or OTUs. So you can convert your grouped table to relative abundances table.
From here you will need to do some scripting, without Qiime2. You need to export relative table to .biom file, convert it to .tsv file, add taxonomy annotations, sort it by taxonomy and relative abundances and select any number of most abundant ASVs or OTUs for each taxonomy you want.
Basically, it is what you need to plot something like this. But, I need to warn you, that what I wrote, is just my opinion, it is how I would try to get this picture, and I can be wrong.
As @jwdebelius explained, the plots were created in R using ggplot. But you can use something else, if you know how to do it (matplolib in Python, for example). It will not be easy to plot.


Thanks for your kind help. I have known more about ASV and OTU from your reply. I proceeded according to Moving Pictures tutorial and used DADA2 to sequence quality control. Now I used the following commands to generate the grouped tables. Is it right?
qiime feature-table group
–i-table table-dada2.qza
–p-axis sample
–m-metadata-file sample-metadata.tsv
–m-metadata-column trea-line
–p-mode sum
–o-grouped-table grouped-table.qza

Then I used the following commands to generate bar-plots, but it failed. How to use the grouped tables to taxonomy analysis? I am still confused. Thanks for your kind help again. :slight_smile:
qiime taxa barplot
–i-table grouped-table.qza
–i-taxonomy taxonomy.qza
–m-metadata-file sample-metadata.tsv
–o-visualization taxa-bar-plots-grouped.qzv

Hi again!

You summarized frequencies from all samples in group - is it by purpose? I would recommend to redo it with

mean-ceiling will take the ceiling of the mean of these frequencies;


median-ceiling will take the ceiling of the median of these frequencies.

Last time I run this command for cllapsed tables I encountered an error as well. It was solved by creating a new metadata file, in which I had only groups instead of samples id’s.
So, if you have only two groups, your new metadata file should contain only two ‘samples’ - your groups.

Thanks again. I am more confused the grouped talbes.And I do not know how to do next. My purpose is to get the dominated species or families in different groups. In fact, There is 4 groups in the sample metadata files.

So you need to create a table, grouped by this 4 groups, and create a new matadata for barplots with those 4 groups insread of sample ID’s.

This is my metadata,including 4 groups: PL,NL,PF and NF. Do you mean delete sample-ID?

Here is my metadata

I wanted to group by Niche column, so I created a new metadata (as different file) with Niche data in Sample ID column

Other columns are not very important on this step.
Earlier it was possible just indicate a column in the command line, but with latest updates I received an error, so I did as I described with a new metadata and it worked

I am not still understand the new metadata file. The “Root” group included many samples(rows) in the first file,but there was one row in your second file. How to convert?

You already converted them in your grouped tables. All you need from this new metadata - groups in the sample ID column. You don’t need this new metadata for anything else. The only purpose - to receive a barplots with groups instead of samples

Thanks for you explanation :slight_smile:. I edited a new metadata file(groups in the sample ID column). And I used the following commands to generate the taxa plots.

qiime feature-table group
--i-table table-dada2.qza
--p-axis sample
--m-metadata-file sample-metadata.tsv
--m-metadata-column trea-line
--p-mode mean-ceiling
--o-grouped-table grouped-table.qza

qiime taxa barplot
--i-table grouped-table.qza
--i-taxonomy taxonomy.qza
--m-metadata-file group-metadata.tsv
--o-visualization taxa-bar-plots-grouped.qzv

The genertated plot is as follows:

Is it right?

The .csv file(level 2) is as follows. But I do not know the meaning of the number in the table, is it the number of bacterial?

I think so

Looks fine
Just maybe you should filter your original tables to get rid from unassigned and assigned only to k__Bacteria features to make it nicer

It is a mean frequency of feature on chosen taxonomy level among all samples in the group.

So the bar length in the taxa plot is generated using these frequancy of feature in the .csv file? Can I use these frequancy of feature to get the plots (like the above Figure 4) using R or Python? But I found the number of "k__Bacteria;__" is not consistent with the bar length in the taxa plot.