how to do the alpha and beta diversity and microbiota composition analysis ?

jwdebelius · May 18, 2020, 3:27pm

As @timanix has so accurately explained,

I will add to Timanix's excellect adivce, and I can tell you based on the plotting style that it's a ggplot figure, perhaps there's someone at your institution who uses R who could help you make something similar. But, in any case, the figure was generated outside of qiime2. Some people like to publish with their qiime2 figures, others will generate them outside the pipeline. Its a personal, stylistic choice based on your coding ability and what you like aesthetically. If R isn't your cup of tea and you're not happy with the qiime graphs, you might consider another plotting software you're more comfortable with.

You might want to check out some previous threads that have discussed this topic.

Best,
Justine

jwdebelius · May 18, 2020, 3:28pm

Can I just say that it warms my heart to read this? Best thing Ive seeen th is morning!

terren · May 19, 2020, 12:30am

Thanks for you explaining patiently. I have known a little about the difference between OTU and ASV. But I am still stuck for how to analyze the dominated species or families in different groups?How to get the quantitiy of species or familes? From which data?
Thanks again for your kind help.

timanix · May 19, 2020, 4:59am

First of all you should decide which tables you want. If ASVs, you can proceed with your tables. If with OTUs - cluster your tables with vsearch with 97% similarity
https://docs.qiime2.org/2020.2/tutorials/otu-clustering/
Next, from your screenshot, I can see, that you have 2 groups of data, and if I understood correctly, you want a figure as fig.4 for 'yes' and 'no' groups. But in your barplot, you have data for each sample, not group. So you need to group your table, in which you will have only two grouped samples, like in figure 4.
https://docs.qiime2.org/2020.2/plugins/available/feature-table/group/
When you will receive a barplot for this grouped tables, you will be able to see which taxonomy is dominating in your groups on any level you choose.
Now you will need to obtain relative abundances of ASVs or OTUs. So you can convert your grouped table to relative abundances table.
From here you will need to do some scripting, without Qiime2. You need to export relative table to .biom file, convert it to .tsv file, add taxonomy annotations, sort it by taxonomy and relative abundances and select any number of most abundant ASVs or OTUs for each taxonomy you want.
Basically, it is what you need to plot something like this. But, I need to warn you, that what I wrote, is just my opinion, it is how I would try to get this picture, and I can be wrong.
As @jwdebelius explained, the plots were created in R using ggplot. But you can use something else, if you know how to do it (matplolib in Python, for example). It will not be easy to plot.

terren · May 20, 2020, 3:56am

Thanks for your kind help. I have known more about ASV and OTU from your reply. I proceeded according to Moving Pictures tutorial and used DADA2 to sequence quality control. Now I used the following commands to generate the grouped tables. Is it right?
qiime feature-table group
--i-table table-dada2.qza
--p-axis sample
--m-metadata-file sample-metadata.tsv
--m-metadata-column trea-line
--p-mode sum
--o-grouped-table grouped-table.qza
--verbose

Then I used the following commands to generate bar-plots, but it failed. How to use the grouped tables to taxonomy analysis? I am still confused. Thanks for your kind help again.
qiime taxa barplot
--i-table grouped-table.qza
--i-taxonomy taxonomy.qza
--m-metadata-file sample-metadata.tsv
--o-visualization taxa-bar-plots-grouped.qzv

timanix · May 20, 2020, 4:25am

Hi again!

You summarized frequencies from all samples in group - is it by purpose? I would recommend to redo it with

mean-ceiling will take the ceiling of the mean of these frequencies;

or

median-ceiling will take the ceiling of the median of these frequencies.

Last time I run this command for cllapsed tables I encountered an error as well. It was solved by creating a new metadata file, in which I had only groups instead of samples id's.
So, if you have only two groups, your new metadata file should contain only two 'samples' - your groups.

terren · May 20, 2020, 4:49am

Thanks again. I am more confused the grouped talbes.And I do not know how to do next. My purpose is to get the dominated species or families in different groups. In fact, There is 4 groups in the sample metadata files.

timanix · May 20, 2020, 4:53am

So you need to create a table, grouped by this 4 groups, and create a new matadata for barplots with those 4 groups insread of sample ID's.

terren · May 20, 2020, 5:00am

This is my metadata,including 4 groups: PL,NL,PF and NF. Do you mean delete sample-ID?

timanix · May 20, 2020, 5:11am

Here is my metadata

I wanted to group by Niche column, so I created a new metadata (as different file) with Niche data in Sample ID column

Other columns are not very important on this step.
Earlier it was possible just indicate a column in the command line, but with latest updates I received an error, so I did as I described with a new metadata and it worked

terren · May 20, 2020, 5:26am

I am not still understand the new metadata file. The "Root" group included many samples(rows) in the first file,but there was one row in your second file. How to convert?

timanix · May 20, 2020, 5:29am

You already converted them in your grouped tables. All you need from this new metadata - groups in the sample ID column. You don't need this new metadata for anything else. The only purpose - to receive a barplots with groups instead of samples

terren · May 20, 2020, 5:24pm

Thanks for you explanation . I edited a new metadata file(groups in the sample ID column). And I used the following commands to generate the taxa plots.

qiime feature-table group
--i-table table-dada2.qza
--p-axis sample
--m-metadata-file sample-metadata.tsv
--m-metadata-column trea-line
--p-mode mean-ceiling
--o-grouped-table grouped-table.qza
--verbose

qiime taxa barplot
--i-table grouped-table.qza
--i-taxonomy taxonomy.qza
--m-metadata-file group-metadata.tsv
--o-visualization taxa-bar-plots-grouped.qzv

The genertated plot is as follows:

Is it right?

The .csv file(level 2) is as follows. But I do not know the meaning of the number in the table, is it the number of bacterial?

timanix · May 20, 2020, 6:11pm

I think so

Looks fine
Just maybe you should filter your original tables to get rid from unassigned and assigned only to k__Bacteria features to make it nicer

It is a mean frequency of feature on chosen taxonomy level among all samples in the group.

terren · May 20, 2020, 11:33pm

So the bar length in the taxa plot is generated using these frequancy of feature in the .csv file? Can I use these frequancy of feature to get the plots (like the above Figure 4) using R or Python? But I found the number of "k__Bacteria;__" is not consistent with the bar length in the taxa plot.

timanix · May 21, 2020, 12:37am

Yes, but pay attention that it was automatically converted to relative abundances instead of frequencies

Yes, but you need to normalize the data among each group, so ferqencies will be converted to relative abundances

I suppose that you are looking into frequencies in the table. On the plot, it look incorrect just because it is showing relative abundances - as a portion of a feature regarding the frequencies of all features in the sample / group, not the frequency itself.
Not you have this data, but you still need to get most abundant ASVs as I was describing in one of the comments above.

terren · May 21, 2020, 3:41am

Sorry, I could not clear to ask question. My question is that the two following graph is not consistent in the frequency.

The second question is that how to get the abundant ASVs. I used dada2. But I am still confused after reading your former comments and checking the results from dada2.

timanix · May 21, 2020, 5:57am

In the table, you have total frequencies. In the graph, relative abundances reflected.
For example, if in one group you have 100 000 frequencies in general, and in another one - 10 000 frequencies, and in the first one Pseudomonas was found 1000 times, and in the second - only 500 times, in the graph with relative abundances (barplot) in first group Pseudomonas will be plotted as 1000 * 100% / 100 000 = 1%, meanwhile in the second group - 500 * 100% / 10 000 = 5%.

There is no tutorial for this part, you should do it by your own. I just described major steps, which, in my opinion, you need to implement to get this graphs you provided. But I never did it by my own. All I can say - it will require some scripting.

terren · May 21, 2020, 6:30pm

Thanks for you so much. I read your comment again: " You need to export relative table to .biom file, convert it to .tsv file, add taxonomy annotations, sort it by taxonomy and relative abundances and select any number of most abundant ASVs or OTUs for each taxonomy you want."
But where can I get the relative table? From table-dada2.qza?

timanix · May 21, 2020, 9:15pm

Hi!
Check out this plugin
https://docs.qiime2.org/2020.2/plugins/available/feature-table/relative-frequency/