Creating reference database from sequences, names, and nodes

clong8887 · May 20, 2021, 8:18pm

Hello and thanks in advance for reading and any help you may be able to provide,

QIIME version: 2019.7 in conda

I am attempting to use the MARES database (link to the paper: Arranz et al. 2020) as my reference sequence database and taxonomy for a COI metabarcoding study.

In the tar.gz files provided, they give a fasta file for the sequences so I'm all good there I think. Alongside the fasta, they provide two files called names.dmp and nodes.dmp that appear to be text files detailing the names and taxonomy for the database. (I can't seem to get it to upload the txt files so people can see them). However, this format does not seem to be compatible with the format needed for a reference taxonomy to use with QIIME classifiers, and I'm not sure if there's an easy way to convert them to a QIIME-usable format.

If you have thoughts or resources on how to do so, I would love to hear them!

Thanks!

lizgehret · May 24, 2021, 6:04pm

Hi @clong8887!

Thanks for reaching out, happy to provide some guidance here!

You'll need to convert these memory dump files (i.e. dmp files) into .txt format before they can be used with the QIIME classifiers. After a quick search online, I found a few .txt file converters that may provide the easiest solution for you. Here is one that I tested out as an example that seemed to work well:

Give that a try, and let me know if you are still having trouble!

Cheers,
Liz

clong8887 · May 24, 2021, 6:49pm

Hi Liz,

Thanks for the reply! I've actually moved on from this; couldn't figure out an easy way to convert the MARES formatted taxonomy into a QIIME acceptable taxonomy (e.g., k_Animalia ;p_Chordata and so on). I'm attempting to use RESCRIPT instead. It won't be curated to marine-only taxonomic groups, so probably will be less computationally efficient, but it looks like it will be a little less of a headache.

Thanks anyway!

lizgehret · May 24, 2021, 7:39pm

Hey @clong8887,

Thanks for the update! Don't hesitate to circle back if you need further assistance; we'll be happy to get a new thread going if you run into any issues. Have a great afternoon!