Custom made nifH taxonomic database compatible with QIIME

bkramer · August 16, 2021, 6:06pm

Hello,

I have demultiplexed paired-end sequences where the nifH gene (samples collected from a freshwater system) was amplified using the Ando primer set (IGK3/DW ; Forward/Reverse primers), but I'm not familiar with a nifH taxonomic database that's compatible with QIIME2.

In a previous post (Suggestions for using nifH ARB database for taxonomy assignment in QIIME2), it was mentioned that @EGvibrio had made a custom database, but beyond that I'm not familiar with any other...

If anyone knows of such a database and/or knows how to get in touch with @EGvibrio I would greatly appreciate it. Thank you!!!

Best Regards,

Ben

colinbrislawn · August 17, 2021, 1:47pm

Good morning Ben,

Reaching out to other members of the forums like you already have with @.EGvibrio is a good place to start. I find that other researchers are happy to share their work.

You could also take existing published work like this (and maybe this), and import it into Qiime using the RESCRIPt plugin, if you want to build a nifH database yourself.

bkramer · August 20, 2021, 10:49pm

Thank you Colin,

I've downloaded the RESCRIPt plugin and have attempted to import BioProjects on NCBI that both the Gaby and Angel papers mention, yet they do not successfully import into QIIME2.

I followed the tutorial for RESCRIPt, swapping out the IDs they include with either IDs for the BioProjects: 418634 and 432667, yet I receive the error "Plugin error from RESCRIPt: Taxonomy format requires at least one row of data."

When I swapped the IDs back for the ones in the tutorial, the code runs fine...so I'm not sure what I'm missing here...

Nicholas_Bokulich · August 21, 2021, 6:37am

As answered here (please don't make duplicate posts of the same error), the NCBI bioprojects do not contain the database that you are attempting to download — my guess is that they contain biological sequences that were used for testing in the studies:

It looks like the database is released directly on the Buckley lab website — only problem is it is only available as an ARB file, the FASTA only contains the sequences. So you can:

download the ARB database directly from there
convert ARB to FASTA format (outside of QIIME 2 — google to find other tools that do this conversion).
split the taxonomies out of the FASTA and place them in a new file (the Buckley website says that this is annotated so I assume there are taxonomy annotations in there). Some additional formatting might be necessary on both the sequences and taxonomy, depending on what state they are in (e.g., taxonomy should be semicolon-delimited)
import the taxonomy and FASTA sequences to QIIME 2

It might be worth getting in touch with them... they might also already have converted files or other formats that you could use to skip directly to step 3 or 4.

Good luck!

bkramer · August 21, 2021, 1:48pm

Thank you (again) Nicholas, and my apologies. I wasn't certain if maybe Colin knew something about those specific databases he referred me to.