Using classify-consensus-blast with NCBI data (via RESCRIPt)

Dear all,

I'm trying to use classify-consensus-blast with NCBI data.
I have downloaded 16s data from NCBI using the following command as outlined here:

qiime rescript get-ncbi-data \
    --p-query '33175[BioProject] OR 33317[BioProject]' \
    --o-sequences ncbi-refseqs-unfiltered.qza \
    --o-taxonomy ncbi-refseqs-taxonomy-unfiltered.qza

Can I use the output files of the above as input files for classify-consensus-blast directly?
So, something like this?

qiime feature-classifier classify-consensus-blast
ā€“i-query rep-seqs.qza
ā€“i-reference-reads ncbi-refseqs-unfiltered.qza
ā€“i-reference-taxonomy ncbi-refseqs-taxonomy-unfiltered.qza
ā€“p-perc-identity 0.8
ā€“p-query-cov 0.3
ā€“p-maxaccepts 1
ā€“o-classification taxonomy.qza

Or do I have to process them further? And if so, could someone advise me how?

Many thanks for your kind help!

Hi @fgara,

You certainly can if you'd like. But I'd recommend some minimal filtering, as outlined in the post you linked, as well as the other RESCRIPt tutorials. It is always a good idea to perform some filtering of the data to avoid spurious taxonomic classifications.

I would not recommend using such a low --p-query-cov value, as this will return many non-specific hits. Also, I'd never recommend only taking the first acceptable BLAST hit, so I'd suggest keeping --p-maxaccepts 10 or higher. There can be many equivalent BLAST hits that land on very different taxa. This is why the --p-min-consensus parameter exists.

-Mike

1 Like

If you use blast also note that it looks at the first maxaccept hits, not the best maxaccept hits. If you want the best you may want to use the vsearch classifier instead.

3 Likes