RESCRIPt for entire Blast Nucleotide database?

Hi all,

I'm not sure how to use the RESCRIPt plugin, or if I should be using it for my project. I'm hoping some one might be able to steer me in the right direction!

I'm trying to run a consensus blast for custom amplicon libraries (which are not 16s or 18s etc) on the Blast nucleotide database.

I'm trying to do this in order to create Qiime2 qza files in order to run the "qiime feature-classifier classify-consensus-blast".

So my three questions are:

  1. Is RESCRIPt appropriate for downloading the Blast N database - or is that too large? I saw the NCBI policy that large downloads need to be done in a certain time frame and night. I also saw the command parameters that need to be changed for larger downloads. I can't narrow down the database to certain kingdom or organism etc, because we actually need to search the entire nucleotide database for our project.

  2. How do I find the identifier of the Blast N database for the p-query parameter? I couldn't find in the example how to find the identifier on the NCBI website. The example (found here) code has " --p-query '33175[BioProject] OR 33317[BioProject]', but I don't know how those identifiers are found.

qiime rescript get-ncbi-data
--p-query 'Blast n?'
--o-sequences ncbi-refseqs-unfiltered.qza
--o-taxonomy ncbi-refseqs-taxonomy-unfiltered.qza

  1. Besides RESCRIPt, is there another way I can generate the reference reads and reference taxonomy QZA files for Qiime2's "qiime feature-classifier classify-consensus-blast" command?

Many thanks for reading, and thanks in advance for any insight.

Hi @ahale004,

RE 1: RESCRIPt does not currently provide tools to directly download pre-formatted BLAST databases, e.g. ('nt'), which are located here ftp://ftp.ncbi.nih.gov/blast/db/. Using the full pre-formatted nt database would be much quicker than downloading and compiling yourself. You can, read up on how to run BLAST locally.

One of the other devs reminded me that we have been considering adding this feature to RESCRIPt via these relevant GitHub PRs:

But we do not yet have a time-line for when we'll be able to implement this. Until then, I'd recommend a making several amplicon-specific reference databases. More on this below....

RE 2: As far as I know, it is not possible to download the nucleotide BLAST database via a query.

RE 3: There are many ways to generate reference reads for use in qiime feature-classifier classify-consensus-blast, all you require is a way to construct a taxonomy.qza and a sequence.qza file. Some examples of which are here.


Making individual custom databases:
Have you read through the RESCRIPt tutorials as listed here? There are several ways by which you can use RESCRIPt to make specific custom amplicon databases using data from GenBank. For example, there is one tutorial for making a trnL database, and another tutorial that outlines a different approach, on constructing a 12S rRNA gene database.

-Cheers!

4 Likes

Thanks so much for such a detailed answer!!

I'm not sure if I'm understanding all of Qiime2, so I wanted to ask:

Is there any way to assign taxonomy in Qiime2 by blasting the entire Blast Nucleotide database?

It seems not - but wanted to ask!

Thanks again,

Amanda

1 Like

As implied by RE 1, no. Currently, it is best to follow the approach outlined at the end of my last post.

2 Likes

Thank you, much appreciated!

1 Like

Hi @ahale004,

I was just informed by another moderator that, there might be a way to set up a BLASTDB in :qiime2:.

The pull request I mentioned earlier in this thread was recently accepted as part of q2-feature-classifier. So, the following approach should be doable in the forthcoming release of :qiime2:.

You should be able to:

  1. download the blast nt database (there is a perl script that ships with blastn to do this).
  2. import as BLASTDB

There has not been much in the way of testing this approach just yet. But you can give this a try and let us know how it goes, as this method and format was designed to work with custom databases, it was not tested with the pre-formatted databases that NCBI ships

1 Like