SEPP reference databases for Greengenes2

Uni · January 15, 2024, 2:43pm

Hello. I want to express my gratitude to the qiime2 developers, as I've been using it quite well. Recently, I've encountered a problem while using the greengenes2 databases and I'm seeking help. Is there a SEPP reference database available that is based on the greengenes2 databases? Or should I simply use the phylogeny data provided by the greengenes2 databases as the input reference database? If it's the latter, I would also like to know which among the several phylogeny data (2022.10.phylogeny.asv.nwk.qza, 2022.10.phylogeny.id.nwk.qza, 2022.10.phylogeny.md5.nwk.qza) should I use. (For reference, I trained a classifier for V3-V4 using Greengenes2 (2022.10.backbone.full-length.fna.qza and 2022.10.backbone.tax.qza).)

wasade · January 16, 2024, 4:37pm

Hi @Uni,

Thank you for the kind words!

Fragment placement in Greengenes2 uses an improved algorithm called DEPP. For 2022.10 we placed ~20M V4 ASVs, which can be readily used. For non V4 data, we do not yet have a means to place fragments but we are working on getting the steps described in a tutorial. For these data, one viable path forward would be to use the non-v4-16s action which performs closed reference recruitment to the backbone, and then to use the 2022.10.phylogeny.id.nwk.qza phylogeny. Alternatively, it is likely possible to train a SEPP model from the backbone, but I haven't done that before and there may be increased unknown.

More information on how the files differ can be found in the Greengenes2 tutorial and in the README with the files.

Best,
Daniel

Uni · January 17, 2024, 6:04am

Thank you sincerely for your response.

If we are using the greengenes2 databases for non-V4 data, what is the recommended method to obtain phylogeny artifacts?

Do you suggest using the align-to-tree-mafft-fasttree method in qiime2 to obtain the phylogeny? Or is it acceptable to simply use the 2022.10.phylogeny.id.nwk.qza phylogeny data for phylogenetic analysis? I'm unsure about which method to use because the greengenes2 databases use different algorithms compared to the previous databases.

wasade · January 17, 2024, 5:17pm

Hi @Uni,

The current recommendations for non-V4 data are outlined here.

Best,
Daniel

system · February 17, 2024, 11:19pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.