Add additional sequences to the annotation database

Agatsuma · September 14, 2022, 1:09pm

Hi everyone，
I want to use qiime2 to annotate my amplicon sequence files, but silva-138-99-515-806-nb-classifier.qza does not contain all the sequences I want to know, I want to know how to pretrain Add extra sequences to your classifier, or how to train your own classifier when you only have dataframe sequence files.
Thanks for your answer.

colinbrislawn · September 14, 2022, 3:26pm

Hello @Agatsuma,

We now have a new plugin call RESCRIPt and a full tutorial about creating your own reference databases!

Let us know if that works for you. You may need a computer with a lot of memory depending on the size of the database.

Agatsuma · September 16, 2022, 3:00am

Thanks for your reply.
I have gone through this tutorial and I still don't know at which step or which file to insert my sequence and taxonomy information. Do I need to insert my sequence information in the following four files:

tax_slv_ssu_138.1.txt.gz
taxmap_slv_ssu_ref_nr_138.1.txt.gz
tax_slv_ssu_138.1.tre.gz
the sequence file:
SILVA_138.1_SSURef_Nr99_tax_silva_trunc.fasta.gz
And what kind of information needs to be inserted in each file, if needed.

colinbrislawn · September 16, 2022, 3:14pm

This depends on the format of your input data, and if you can use an 'Easy Mode' like qiime rescript get-silva-data.

The full pipeline is described in Getting SILVA data: Hard Mode. You import the three input files, then reverse-transcribe and parse-silva-taxonomy.

Each file imported has a --type, like --type 'FeatureData[SILVATaxonomy]'. Those types are listed here. The input test data for the RESCRIPt plugin can be used as examples of that format.

At the risk of stating the obvious, this process is hard and technical. I try to avoid this by using prebuilt classifiers, or databases already in the Qiime2 format.

What database are you trying to import?