BLAST my fungal beta-tubulin sequences with brocc


(Afaq) #1

Hi Kyle,
I have a question on Brocc. Can I use this software to BLAST my fungal beta-tubulin sequences with NCBI nt database?

-Afaq


Q2-Brocc: Community Tutorial
(Matthew Ryan Dillon) #2

pinging @kylebittinger :qiime2:


(Kyle Bittinger) #3

The sort answer is yes. BROCC is designed to classify short reads against the nt database, and in theory it shouldn’t matter much if your reads are 16S, 18S, ITS, or beta-tubulin. You may have to adjust the parameters differently, most importantly the number of database hits used, and the similarity thresholds. Here are some guidelines.
For setting the number of database hits, you want to collect enough alignments to capture at least 3 votes for the winner, after removing all the junk. The typical setting is 100 total hits. You’ll want to raise this if you have a lot of hits to unassigned or environmental organisms.
For setting the similarity thresholds, a literature search may reveal good species and genus-level cutoffs for this gene. If not, you could try BLASTing a few examples at the NCBI website. As a last resort, you can use the fact that average nucleotide identity is 95% for genomes within a species. Thus, something like 94 or 95% would be OK as a species-level threshold when you have no other information.
I can’t encourage you enough to try a few examples at NCBI to see what the BLAST results look like. You can follow the algorithm mentally and see if the result matches your intuition. A diagram of the algorithm is here: https://github.com/kylebittinger/q2-brocc#the-brocc-algorithm


(Afaq) #4

Hi Kyle,
Thanks for the reply and useful tips. While I was installing the brocc I get an error when I run this command
qiime brocc classify-brocc --i-query query.qza --o-classification query_brocc.qza

Error is:

Plugin error from brocc:

Command ‘[‘blastn’, ‘-query’, ‘/tmp/qiime2-archive-lj_q5qag/17760cc7-73da-4f13-bbde-a164e83a5bcf/data/dna-sequences.fasta’, ‘-evalue’, ‘1e-05’, ‘-outfmt’, ‘7’, ‘-db’, ‘nt’, ‘-max_target_seqs’, ‘100’, ‘-out’, ‘/tmp/tmp06mu72o1’]’ returned non-zero exit status 2

Debug info has been saved to /tmp/qiime2-q2cli-err-0c10380r.log

Could you please help me on it?

Thank you again

-Afaq


(Matthew Ryan Dillon) #5

Hey @amm59063, I can’t help you with this, but I suspect @kylebittinger will need the log (/tmp/qiime2-q2cli-err-0c10380r.log), or, for you to re-run the command with the --verbose flag, and include those results here. Thanks!


(Afaq) #6

Hi @kylebittinger @thermokarst,
Following was the error message when I run the command with --verbose flag

BLAST Database error: No alias or index file found for nucleotide database [nt] in search path [/home/qiime2:/home/qiime2/nt_database/nt.:]
Traceback (most recent call last):
File “/home/qiime2/miniconda/envs/qiime2-2018.8/lib/python3.5/site-packages/q2cli/commands.py”, line 274, in call
results = action(**arguments)
File “”, line 2, in classify_brocc
File “/home/qiime2/miniconda/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 231, in bound_callable
output_types, provenance)
File “/home/qiime2/miniconda/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 362, in callable_executor
output_views = self._callable(**view_args)
File “/home/qiime2/miniconda/envs/qiime2-2018.8/lib/python3.5/site-packages/q2_brocc/plugin_setup.py”, line 59, in classify_brocc
subprocess.run(blast_cmd, check=True)
File “/home/qiime2/miniconda/envs/qiime2-2018.8/lib/python3.5/subprocess.py”, line 398, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command ‘[‘blastn’, ‘-query’, ‘/tmp/qiime2-archive-3op81oyk/17760cc7-73da-4f13-bbde-a164e83a5bcf/data/dna-sequences.fasta’, ‘-evalue’, ‘1e-05’, ‘-outfmt’, ‘7’, ‘-db’, ‘nt’, ‘-max_target_seqs’, ‘100’, ‘-out’, ‘/tmp/tmpomh831d1’]’ returned non-zero exit status 2

Plugin error from brocc:

Command ‘[‘blastn’, ‘-query’, ‘/tmp/qiime2-archive-3op81oyk/17760cc7-73da-4f13-bbde-a164e83a5bcf/data/dna-sequences.fasta’, ‘-evalue’, ‘1e-05’, ‘-outfmt’, ‘7’, ‘-db’, ‘nt’, ‘-max_target_seqs’, ‘100’, ‘-out’, ‘/tmp/tmpomh831d1’]’ returned non-zero exit status 2

See above for debug info.

Thank you

-Afaq


(Kyle Bittinger) #7

According to the BLAST documentation, this exit code is used if there is a problem with the BLAST database. See the chart here: https://www.ncbi.nlm.nih.gov/books/NBK279684/table/appendices.Tc/

Would you be able to go back through the installation instructions here: https://github.com/kylebittinger/q2-brocc#installing-the-plugin

I’m interested in what happens when you get as output from the following step, in which you check that the nt database is configured correctly.

blastn -query query.fasta -db nt -outfmt 7

If you see some BLAST results, that’s a good sign – please post them here. If you get an error, post that and we’ll work on it.

Here is one more thing to check: if you type the following just before issuing the classify-brocc command, you should see the directory containing the nt database printed to your screen.

echo $BLASTDB

If you take the result printed here, and list the files in that directory, you should see a bunch of files like “nt.01.nhr”, “nt.01.nin”, and so on. Something like 700 files, all numbered with extensions in this format.


(Afaq) #8

Hi @kylebittinger,

I fixed the error, It was my fault. I changed the name of the folder which has the db files. Then I should pass the --p-blastdb flag.

Thanks for your support


(Kyle Bittinger) #9

Great! I definitely encourage you to run a few example beta-tubulin sequences to check the assignments yourself. BROCC’s answer should roughly match what you would have come up with, looking at the BLAST results.