PICRUSt2 reference KEGG Database

I have successfully been using PICRUSt2 to analyze the predicted KEGG orthologies associated with my 16srRNA data but I am missing one enzyme of interest for us KO 20038 ( it is also not found via the E.C predictions). I think that the problem is due to the enzyme being added relatively recently to the KEGG database as in the google PICRUSt2 forum it was suggested that the KEGG v52 was used as the reference database for PICRUSt2. Is this the case, and if so is there a way to use a more recent KEGG database to make the PICRUSt2 predictions?
Thankyou for your advice.

Hi @Katty8,
I am not the q2-picrust2 developer so am not sure, but am pinging @gmdouglas to see if he can help.
Thanks!

Hi @Katty8,

The PICRUSt2 predictions are based on the latest IMG genome annotations. There is a delay between those annotations and the gene family database versions, which is why the newest additions are not present in the default database. Sorry, but there’s no easy way for you to get predictions for that gene family with PICRUSt2 unless you wanted to make a custom database containing that gene family and run the standalone version (i.e. outside of qiime2), which would be a substantial amount of work.

Best,

Gavin

1 Like

Dear Gavin,
Thank you for your clear and rapid response to my enquiry. We will reconsider our analysis in this context.
Kind regards,
Kat

1 Like

Hi @gmdouglas ,

I have the same issue with missing KO in my analysis. Is there any description or tutorial to make this custom database? I have around 3-4 KOs missing to investigate them.

Kind regards,
Faisal

Hi @Faisal,

There isn’t a tutorial yet, but you can see what the required alignment/tree files look like here: https://github.com/picrust/picrust2/wiki/Sequence-placement#using-custom-reference-files

You can take a look at those and also the default gene family abundance tables, which are also packaged in the picrust2 github repo.

Best,

Gavin

1 Like