Check out the documentation for classify-consensus-vsearch. It's not a full tutorial, but you can follow the other tutorials and use this plugin instead when you get to the feature-classifier step.
I was wondering if there is a way to retrieve the FeatureData[Taxonomy] artifact from a multifasta with accension number and taxid without downloading all the sequences again using rescript.
My multifasta/database looks like this:
I want to write the taxonomy file for the taxonomic assignation in the format
AccesionNumber Order;Family;Genus;Species
Starting from a multifasta in which I have the AN and the taxid in the header sa I posted above.
I will use this multifasta also ad my reference database.
I think I can help you make a list of just your Accession Numbers (AN) from that fast file, but we will have to use a different taxonomy database, say from NCBI, to get Order;Family;Genus;Species.
Do you have a database source already downloaded, or are you asking for help finding one?
I have already created a database with 12s from vertebrates. I have downloaded all the VRT seq from embl and than performed a ecoPCR with my metabarcoding primer. So I have this multifasta containing all unique 12s sequences annotated as above.
I need to create the taxonomy file starting from this fasta file in which all headers start with the AN and there is also the taxid reported as taxid=XXXXX.
Check out our RESCRIPt 12S db tutorial. It's just a simple example of what you can do, but it should help you get started...
Although it means re-downloading everything again, and potentially in batches by taxonomic group. But everything will be formatted properly.
The notebook specifically looks for records of 12S reads, but you can perform a separate search that downloads only genome records and then use feature-classifier extract-reads to extract the ampicon region from those genomes. Then you can merge with the other batches of data.
Anyway, I just wanted to provide another option that you can try running in parallel.