microbiota analysis of insect species

Hello everyone, I'm new to the world of Qiime2 and bioinformatics and I often encounter difficulties in processing my data.
I'll explain exactly what I'm working on: I extracted gut microbial DNA from three different insect species raised on 2 different diets (control and treated). The Illumina sequencing was done on the 16s gene, in the V3-V4 region.
I'm currently processing the data and analysing the alpha and beta diversity, I'm having trouble figuring out how to process the data.
I would like to compare the bacterial communities of the three insect species. Do you think I should first analyse the differences in the bacteria of the three insect species raised on a control diet and then on a treated diet? Would you then recommend that I analyse the differences in each species between the control and treated diets?
To do this, should I create another table and filter the data? In the first case, should I filter the insect species and only the control diet and then the treated diet? In the second case, should I filter only the insect species and both the control and treated diets?
Sorry for the question, it may seem trivial but I often get confused. Unfortunately, this is the first time I've tried bioinformatics.

I will take this opportunity to ask a second question. Regarding the taxonomic analysis, do you recommend training the classifier? I tried to do the analysis without training greengenes2 for my V3-V4 region, but unfortunately some features are not identified beyond the domain and a lot of data is lost.
Do you think that training the classifier will improve the situation?

Thanks in advance :pray: :smiling_face_with_three_hearts:

Hi @Linda_Abenaim,
Welcome to the :qiime2: forum!

Many of your questions are preference-based, meaning that it mostly is about your preference and how you go about analyzing your data. I am more than happy to share how I would go about this analysis but I do want to just give a big disclaimer. This is just how I would go about this analysis and that does mean that it is the most right way. Its just a way to go about it

Do you think that there will be a difference composition in insect species?

I would probably filter down to just the control samples and compare your 3 different insect microbiomes.

Similarly you could test if there is a difference in the 3 insects microbiome on treated by filtering down to treated samples and comparing the 3 insects microbiome.

Here is a tutorial for filtering data: Filtering data — QIIME 2 2023.9.2 documentation

I would probably investigate all the insects together and test if the composition is different between diet. Then I would filtered down to just the insects and test diet again!

I am prone to looking at more comparisons than less.
I hope this clarified things! Please let me know if you are still confused and I am more than happy to clarify.

This isn't trivial and we are here to help! :qiime2:

I believe that you will have to train your own classifier because q2 only has V4 classifiers. This would explain your issue with taxonomic classification.

Here is the tutorial for that Processing, filtering, and evaluating the SILVA database (and other reference sequence data) with RESCRIPt

I hope all this helps! Feel free to post again if your run into any errors!


Thank you very much Chloe for your precious help!
You solved me many many doubts.
I am proceeding with the alpha analysis and I have filtered the table.qza with the samples of interest. I only have a doubt about the sampling depth. I notice that if the sampling depth is lower (therefore I eliminate fewer samples) I have a higher faith significance (p value < 0.05). What do you advise? I didn't understand very well whether it is mandatory to choose the sampling depth.

Another question, if faith is significant but evenness is not, what does it mean?

For the rarefaction graphic, do you suggest always doing this on filtered data, like for alpha and beta diversity?

Thank you very much again for your help!

Hi @Linda_Abenaim,
Glad I could help!

Sampling depth is not mandatory but some sort of rarefaction is required. Choosing a sampling depth can be tricky! We want to make sure we are choosing a sampling depth that allows us to investigate as many sequences as possible but also not lose too many samples. Additionally, we need to make sure its a reasonable sampling depth. A reasonable sampling depth is a sampling depth were alpha diversity metrics are stable. If adding more sequences to a sample changes the diversity metric than its probably not representative of your sample. We can test this by looking at the alpha-rarefaction plots.

Have you seen any of the QIIME 2 youtube videos? There is a really good video about choosing a sampling depth: https://www.youtube.com/watch?v=q-S2qVMyCVs&t=1344s

But also in generally I think you might find the videos walking through the Parkinson mouse tutorial very helpful: https://www.youtube.com/watch?v=M2iXewkYHE0&list=PLbVDKwGpb3XmkQmoBy1wh3QfWlWdn_pTT

Faiths PD and Evenness ask different questions. Evenness asks about how even your diversity is. For example, If you have 4 microbes in your microbiome and all of them make up 25% of your microbiome then your microbiome is very even. However, if you had the same 4 microbes but 1 of the microbes accounted for 90% of the microbiome then it would be very uneven. Faith's PD is a metric of phylogenetic difference. So if ASVs are not the same but are close on a phylogenetic tree then that would account for less diversity then 2 ASVs that are completely separate on a phylogenetic tree.

Now that I have explained the differences between these metrics. What do you think it means that faith is significant and not evenness?

I look at that every time I select a new sampling depth.

I hope this helps!

1 Like

Thanks again for your help!!
I think that a significative faith means that between the control diet and the treated diet there is a phylogenetic difference between the species of bacteria presented. The evenness not significant probably means that there is no difference between the distribution of bacteria between two diet. Consequently the bacterial species are uniform for both diets although they are different phylogenetically.
Did I get it right ?

Thank you so much for videos and links :pray::smiling_face_with_three_hearts:

Yes! It seems the evenness of your groups isn't different. But the community richness (taking into account phylogenetic distance) seems to be significant.

Thank you so much for your help, Chloe!!
For that concern the tanoxmomy analyses, can I follow the tutorial "Training feature classifiers with Q2-feature classifier"? Do you suggest the last version of Silva instead of Greengenes 2?
Where can I found the dataset in Fasta format?

thank you again!

Hi @Linda_Abenaim
I would recommend this tutorial: Processing, filtering, and evaluating the SILVA database (and other reference sequence data) with RESCRIPt

It should walk you though all the steps.

My preferences for databases is GTDB, but that's a preference thing. Silva is not exclusively 16S, so its pretty big and takes a while to run. If you want to use Greengenes2 (which uses GTDB and some), this post explain using Greengenes2 Classifier: Introducing Greengenes2 2022.10

1 Like

Thanks so much again Chloe.
Unfortunately I have another question, I can't interpret the results of my beta diversity (I attach the file) and I can't understand if for the pcoa I have to rely on the results obtained with core-metrics-phylogeny or I have to redo it with p-custom-axes. In the "moving picture" tutorial he uses it as it has data distributed over time, in my case not. Can you help me understand?

Thank you so much for all your help

unweighted-unifrac-treatment-significance-bsf.qzv (311.3 KB)
in these results i analysed the beta diversity between the bacteria population of an insect species reared with control and treated diet.

Hi again @Linda_Abenaim!

Glad I can help!

Here is a really detailed video explaining interpretation of beta diversity results: https://youtu.be/EEs3_pBQGus?si=9qpgsjGPrY-jsqoG

Yes, They are based off the PCoA-result, which is based off the distance matrix for the metric.

You do not have to redo it with a custom axis! If you dont have a metadata column that makes sense for a custom axis, you can easily just skip that step with no issues. (I often skip that step)

As for interpretation:
For the PCoA, you are looking for separation between data points. Each point on the PCoA is representing a sample and you can see how similar the samples are to each other based on how close they are to each other.

For the .qzv you linked, you are statistically testing what the PCoA is representing. And it looks like you do have significance between C and T! To figure this out, I am looking at the q-value in the pairwise comparison table!

I really recommend that video that I linked above. It will go into much more detail!
Hope that helps!


Thanks again Chloe for your patience!
What I don't understand about the file I sent you is: why when I look at the box plots it seems that there is no significance?
on the x-axis there are c(n=66) t(n=132) and t(n=55) c(n=132)? what are these different numbers?

Hi @Linda_Abenaim,

It seems to me that they boxplots have some overlap and are mostly different. And the median of the plots look pretty different to me.

if you want to see what other stats say I would try using a different stat methods by using the --p-method parameter

  --p-method TEXT Choices('permanova', 'anosim', 'permdisp')
                       The group significance test to be applied.
                                                        [default: 'permanova']

Those numbers are how many comparisons are made! Remember that every sample is compared to every other sample so the N can get pretty big.


I'm so sorry Chloe, I don't understand the two bloxplots. Could you explain? thank you so much
Another questions, is there a method to convert an emperor to 2d graphic? I don't understand very well my bray curtis emperor.
bray_curtis_emperor.qzv (861.7 KB)

thank you again