Acquiring KEGG genes from shotgun microbiome data.

Is there a way to obtain specific KEGG genes that are changes with our independent variable using qiime2 analysis or is this tutorial the best way to proceed?


Hi @thompsrs,

:scream: that is a qiime 1 tutorial, so almost certainly not the best way to proceed (because qiime 1 is no longer current and can be challenging to install)

It sounds like you are trying to perform differential abundance testing on a feature table of KEGG genes… in which case q2-composition (ANCOM) or q2-aldex will be a good hypothesis test. E.g., see a relevant usage example here:

Is that what you need, or are you trying to identify the genes themselves from shotgun data fresh off the press?

I think this is partially what I’m looking for. I’m trying to identify KEGG genes in the shotgun data and then run ANCOM on those to see differences in KEGG genes based on our independent variable.

Hi @thompsrs,

I am afraid that this is not yet entirely possible with QIIME2 (although we are hoping to add the required functionalities in the near future). Functional annotation of genes is currently not supported, neither is working with protein sequences (coming soon though!).

Should you still want to use QIIME2 for your differential abundance analysis, you would first need to run a functional annotation tool like eggNOG (either locally or using the Web service) and convert the resulting annotations from your samples into a feature table. This then can be imported as an artifact and used by any of the plugins mentioned above.

Let us know in case you have more questions.




This seems like what I would want to do. But our shotgun data files are on average 6.6million reads so they won’t work with eggNOG web service. Do you have an alternative for a functional annotation tool that can work better with such large files? Or other suggestions?


Hi @thompsrs,

I don’t think that should matter - you probably want to first assemble your reads into contigs/MAGs and only then submit those for functional annotation. In that case the amount of data you need to submit should be much smaller.

I personally have not used any other service but I saw the PANNZER server that supposedly can do a similar job (you need to submit a FASTA file with translated proteins though). Another option is to just run eggNOG locally or in an HPC environment, provided you have access to such a service. For instructions on how to do it, check out the project’s Wiki. For annotation of prokaryotic genomes, there’s also Prokka - it does not have a Web service though.

Hope that helps!


1 Like