Help Needed Generating Database with RESCRIPt

I am trying to generate a database for complete bacteria present in NCBI using this command to annotate the ASV table.

qiime rescript get-ncbi-data --p-query 'txid2[ORGN] ' --output-dir NCBIdata_BactOnly

However this is no working, the commands get killed in between. Also, takes a lot of time(In days) to run

SILVIA and greens database is not working for my amplicomn data

@Dhwani_Dholakia,

I moved your last post from Annotate Bacterial Genes Amplicon data to a separate post here in Community Plugin Support.

The tool you are trying to use (rescript) is a community plugin and I thought you might get some better answers here than having it be part of your other question. It also makes it easier to find if other users have the same question in the future.

Hi @Dhwani_Dholakia ,

You are attempting to download all bacterial sequences in the nucleotide database... it is an extraordinarily large query, consisting of 71,204,621 entries... so it is not a surprise that the job gets killed.

Please also read carefully the help documentation (type qiime rescript get-ncbi-data --help to view this), particularly this part:

  Please be aware of the NCBI Disclaimer and Copyright notice
  (https://www.ncbi.nlm.nih.gov/home/about/policies/), particularly "run
  retrieval scripts on weekends or between 9 pm and 5 am Eastern Time
  weekdays for any series of more than 100 requests". As a rough guide, if
  you are downloading more than 125,000 sequences, only run this method at
  those times.

If you are interested in bacterial 16S sequences specifically, see this tutorial:

Good luck!

1 Like

@Nicholas_Bokulich

My Amplicon data is of the sequence generated using the primer of eight different bacterial genes. So in this case, I guess 16S data is not suitable, and also when I annotated using the SILVIA database, around 1% of data was annotated. So I wanted the complete bacterial gene database, with a hope that it will annotate my input data

This is the error i get

Plugin error from rescript:

Maximum retries (10) exceeded for HTTP request. Persistent trouble downloading from NCBI. Last exception was
ChunkedEncodingError: ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer'))

So create databases for those specific genes... not all bacterial sequences (including many full genomes). What you are attempting will not work.

It looks like you are having difficulty establishing a connection, or your connection is being killed because you are requesting too much/at the wrong times, as indicated by the policy message that I noted above. Please see that link for more details...

I recommend trying in a few days with a much more specific query.

Good luck!

@Nicholas_Bokulich

So create databases for those specific genes... not all bacterial sequences (including many full genomes). What you are attempting will not work.

I too thought the same thing. But how do I get the list of all bacteria that that has those genes and then create a specific database for it?