Hello everyone, I'm new to the world of Qiime2 and bioinformatics and I often encounter difficulties in processing my data.
I'll explain exactly what I'm working on: I extracted gut microbial DNA from three different insect species raised on 2 different diets (control and treated). The Illumina sequencing was done on the 16s gene, in the V3-V4 region.
I'm currently processing the data and analysing the alpha and beta diversity, I'm having trouble figuring out how to process the data.
I would like to compare the bacterial communities of the three insect species. Do you think I should first analyse the differences in the bacteria of the three insect species raised on a control diet and then on a treated diet? Would you then recommend that I analyse the differences in each species between the control and treated diets?
To do this, should I create another table and filter the data? In the first case, should I filter the insect species and only the control diet and then the treated diet? In the second case, should I filter only the insect species and both the control and treated diets?
Sorry for the question, it may seem trivial but I often get confused. Unfortunately, this is the first time I've tried bioinformatics.
I will take this opportunity to ask a second question. Regarding the taxonomic analysis, do you recommend training the classifier? I tried to do the analysis without training greengenes2 for my V3-V4 region, but unfortunately some features are not identified beyond the domain and a lot of data is lost.
Do you think that training the classifier will improve the situation?
Many of your questions are preference-based, meaning that it mostly is about your preference and how you go about analyzing your data. I am more than happy to share how I would go about this analysis but I do want to just give a big disclaimer. This is just how I would go about this analysis and that does mean that it is the most right way. Its just a way to go about it
Do you think that there will be a difference composition in insect species?
I would probably filter down to just the control samples and compare your 3 different insect microbiomes.
Similarly you could test if there is a difference in the 3 insects microbiome on treated by filtering down to treated samples and comparing the 3 insects microbiome.
I would probably investigate all the insects together and test if the composition is different between diet. Then I would filtered down to just the insects and test diet again!
I am prone to looking at more comparisons than less.
I hope this clarified things! Please let me know if you are still confused and I am more than happy to clarify.
This isn't trivial and we are here to help!
I believe that you will have to train your own classifier because q2 only has V4 classifiers. This would explain your issue with taxonomic classification.
Thank you very much Chloe for your precious help!
You solved me many many doubts.
I am proceeding with the alpha analysis and I have filtered the table.qza with the samples of interest. I only have a doubt about the sampling depth. I notice that if the sampling depth is lower (therefore I eliminate fewer samples) I have a higher faith significance (p value < 0.05). What do you advise? I didn't understand very well whether it is mandatory to choose the sampling depth.
Another question, if faith is significant but evenness is not, what does it mean?
For the rarefaction graphic, do you suggest always doing this on filtered data, like for alpha and beta diversity?
Sampling depth is not mandatory but some sort of rarefaction is required. Choosing a sampling depth can be tricky! We want to make sure we are choosing a sampling depth that allows us to investigate as many sequences as possible but also not lose too many samples. Additionally, we need to make sure its a reasonable sampling depth. A reasonable sampling depth is a sampling depth were alpha diversity metrics are stable. If adding more sequences to a sample changes the diversity metric than its probably not representative of your sample. We can test this by looking at the alpha-rarefaction plots.
Faiths PD and Evenness ask different questions. Evenness asks about how even your diversity is. For example, If you have 4 microbes in your microbiome and all of them make up 25% of your microbiome then your microbiome is very even. However, if you had the same 4 microbes but 1 of the microbes accounted for 90% of the microbiome then it would be very uneven. Faith's PD is a metric of phylogenetic difference. So if ASVs are not the same but are close on a phylogenetic tree then that would account for less diversity then 2 ASVs that are completely separate on a phylogenetic tree.
Now that I have explained the differences between these metrics. What do you think it means that faith is significant and not evenness?
I look at that every time I select a new sampling depth.
Thanks again for your help!!
I think that a significative faith means that between the control diet and the treated diet there is a phylogenetic difference between the species of bacteria presented. The evenness not significant probably means that there is no difference between the distribution of bacteria between two diet. Consequently the bacterial species are uniform for both diets although they are different phylogenetically.
Did I get it right ?
Yes! It seems the evenness of your groups isn't different. But the community richness (taking into account phylogenetic distance) seems to be significant.
Thank you so much for your help, Chloe!!
For that concern the tanoxmomy analyses, can I follow the tutorial "Training feature classifiers with Q2-feature classifier"? Do you suggest the last version of Silva instead of Greengenes 2?
Where can I found the dataset in Fasta format?
My preferences for databases is GTDB, but that's a preference thing. Silva is not exclusively 16S, so its pretty big and takes a while to run. If you want to use Greengenes2 (which uses GTDB and some), this post explain using Greengenes2 Classifier: Introducing Greengenes2 2022.10
Thanks so much again Chloe.
Unfortunately I have another question, I can't interpret the results of my beta diversity (I attach the file) and I can't understand if for the pcoa I have to rely on the results obtained with core-metrics-phylogeny or I have to redo it with p-custom-axes. In the "moving picture" tutorial he uses it as it has data distributed over time, in my case not. Can you help me understand?
Thank you so much for all your help
unweighted-unifrac-treatment-significance-bsf.qzv (311.3 KB)
in these results i analysed the beta diversity between the bacteria population of an insect species reared with control and treated diet.
Yes, They are based off the PCoA-result, which is based off the distance matrix for the metric.
You do not have to redo it with a custom axis! If you dont have a metadata column that makes sense for a custom axis, you can easily just skip that step with no issues. (I often skip that step)
As for interpretation:
For the PCoA, you are looking for separation between data points. Each point on the PCoA is representing a sample and you can see how similar the samples are to each other based on how close they are to each other.
For the .qzv you linked, you are statistically testing what the PCoA is representing. And it looks like you do have significance between C and T! To figure this out, I am looking at the q-value in the pairwise comparison table!
I really recommend that video that I linked above. It will go into much more detail!
Hope that helps!
Thanks again Chloe for your patience!
What I don't understand about the file I sent you is: why when I look at the box plots it seems that there is no significance?
on the x-axis there are c(n=66) t(n=132) and t(n=55) c(n=132)? what are these different numbers?
I'm so sorry Chloe, I don't understand the two bloxplots. Could you explain? thank you so much
Another questions, is there a method to convert an emperor to 2d graphic? I don't understand very well my bray curtis emperor. bray_curtis_emperor.qzv (861.7 KB)