Custom made nifH taxonomic database compatible with QIIME

Nicholas_Bokulich · August 21, 2021, 6:37am

As answered here (please don't make duplicate posts of the same error), the NCBI bioprojects do not contain the database that you are attempting to download — my guess is that they contain biological sequences that were used for testing in the studies:

It looks like the database is released directly on the Buckley lab website — only problem is it is only available as an ARB file, the FASTA only contains the sequences. So you can:

download the ARB database directly from there
convert ARB to FASTA format (outside of QIIME 2 — google to find other tools that do this conversion).
split the taxonomies out of the FASTA and place them in a new file (the Buckley website says that this is annotated so I assume there are taxonomy annotations in there). Some additional formatting might be necessary on both the sequences and taxonomy, depending on what state they are in (e.g., taxonomy should be semicolon-delimited)
import the taxonomy and FASTA sequences to QIIME 2

It might be worth getting in touch with them... they might also already have converted files or other formats that you could use to skip directly to step 3 or 4.

Good luck!