find a marker in bacterial community of samples

mohsen_ej · June 20, 2021, 6:39am

Hello everyone,
I need to find a probable marker in one of my sample groups. do you know how we can determine a marker in the data?
thank you

colinbrislawn · June 20, 2021, 9:45pm

You could look for features that are differentially expressed in that sample group using something like ALDEx2 to find potentially biomarkers, like in Stevens 2020.

Without knowing more about the project, I'm not sure I know the best way to do this...

mohsen_ej · June 21, 2021, 5:48am

thank you for your information.
actually, the data is about the microbial community of three different types of meats. I want to determine a marker in one of them. I tried to use picrust but I couldn't install It on qiime also couldn't use it on the galaxy server because I don't know how the input data should be.

mohsen_ej · June 21, 2021, 8:48am

this is the error when I want to install picrust

mohsen_ej · June 21, 2021, 9:07am

also, when I wanted to install it manually according to tutorial I couldn't do that

colinbrislawn · June 21, 2021, 11:37am

Hello @mohsen_ej,

Thanks for posting your commands and outputs! Double check the commands you are using when compared to the docs:
https://library.qiime2.org/plugins/q2-picrust2/13/

Note that they use qiime2-2021.2, while it appears that you have already installed qiime2-2021.4 in this conda environment. This would explain the incompatible packages found.

mohsen_ej · June 22, 2021, 5:26pm

thank you for your response. actually, I thought the version should be 2021.2 or newer but after I installed the older version it worked. great. than you.
I have read some tutorials and discussions on it and I could get results from that but now the problem is that I don't know how to use the results. I know these results are not the final results and I can use them in STAMP and in other visualization methods like beta-diversity but not sure what is the difference between these results? there are 3 files, ec_metagenome, ko_metagenome, and pathway_abundance. what is the difference between them and which file should I use for further analysis? the data is about the bacterial community of some kinds of meats, can you suggest to me what further analysis is more appropriate in this case?
thank you and sorry for the long question

colinbrislawn · June 23, 2021, 12:47am

I'm glad you got it working!
(Usually plugins work great with the newest version of Qiime2, but sometimes an older version is needed to address conflicts with software. Right now, picrust is one of these plugins )

I've totally been there. This field is really big, and predicted functional genomics is really different then 16S amplicons.

These are three different ways of 'counting' the functionality predicted by picrust2

biochemical pathways that predict what the metagenome can do, say break down lactose
EC: Enzyme Commission number for the enzymes/enzyme complexes that catalyse the breakdown of lactose, EC 3.2.1.108
KO: Kegg Orthologs: the group of orthologous proteins that all share specific function, say K01229 LCT; lactase-phlorizin hydrolases

These all overlap and reference each other, which makes sense because they are trying to summarize all the things you can do with lactose in a simple list.

Note how they summarize this complex process at different levels:

Pathway: drink milk
EC: catalyze one step of milk drinking (break down lactose )
KO: a single protein family , that with a little help from it's friends, can drink milk

This depends if you want to 'zoom out' to focus on a full pathway, or 'zoom in' to focus on a reaction or a single protein that helps with that reaction.

What are you trying to show? Do you need to 'zoom in' or 'zoom out' to show it?

Colin

P.S. Sorry for long answer
P.P.S If someone wants to improve my shoddy summary of functional genomics, please do

mohsen_ej · June 23, 2021, 5:03am

thank you for your great information. if I could like it 100 times I would
now I understand what I got.
actually, I think it's good to use all of them and see what interesting things will come. does it make sense? and except beta diversity what metrics are commonly used? also, how we can know the meaning of the names? for example what is PWY0-862 or etc? is there any reference or something for that?

mohsen_ej · June 23, 2021, 6:17am

for example I used the pathway_abundance in STAMP in this are the results. can we say that green and orange groups are more similar in the metagenome functions? but how much? how can we say that the difference is significant or not? or how much they are similar?
again sorry for many questions

colinbrislawn · June 23, 2021, 1:07pm

I'm glad I helped!

Sure, try it and see what you find. I wanted to ask about the biological question, as it's important to keep in mind the overall goal of the project. Would you mind sharing that with me?

If you google them, you will find pages in the various databases and links to the original research papers: PWY0-862 - Search. Just like with microbes, the names are made up, but the context matters.

Ah, this is one of the biggest differences between amplicons and functional genomics. Because these proteins / enzymes / pathways are related, you can use other tools to visualize them.

For example, you can take your KOs or EC numbers and plug them into Element Selection in iPath3, and see what pathways light up. (Note, you may have to reformat the names: "EC 3.2.1.108" does not work but "EC3.2.1.108" does)

Comparing the results between your three meat samples, you can see if parts of the metabolomic pathway are complete in some samples but not in others.

You can still use the standard alpha and beta diversity metrics, but using tools that understand the relatedness of the functional genes you have predicted is most interesting!

mohsen_ej · June 24, 2021, 7:33am

thank you for your great information.
we are trying to determine their bacterial communities also test their similarity and dissimilarity and etc.
also trying to find a marker in one of the groups but not sure how we can do this.
about the ipath3, that is interesting but I couldn't realize how should I interpret that.

colinbrislawn · June 24, 2021, 2:26pm

For looking for biomarkers, ~~I like the idea of running ANCOM on ASVs, KOs, and EC numbers. This will give you potential biomarkers, which you can validate in a future study with more samples.~~

I'm not an expert on finding biomarkers, or ANCOM! Someone else should comment more!

mohsen_ej · June 24, 2021, 2:35pm

thank you for your help. but I thought one of the assumptions for using ANCOM is that less than 25% of the features should change in the data. am I right? so, do you think I can use ANCOM here? I used Lefse for differential abundance analysis but there were a lot of EC or KOs in the results.
I am really sorry because of my many questions.
I used bray Curtis to compare the difference of EC for example. in emperor and PCA as well. my question is that how do you interpret no significant result in bray Curtis in this case?
can we say there is no significant difference in the types of enzymes in our 3 groups?
again sorry for my questions

mohsen_ej · July 25, 2021, 4:53am

I'm sorry to bother you again, I asked this question in the forum but nobody answered. could you please let me know how to know the results are connected to some diseases? for example which of the pathways are connected to which diseases, if any?
I used genome.jp and metacyc.com and ssgcid but nothing found.
thank you very much and sorry again

system · August 25, 2021, 8:05pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.