BLAST parser for assign_taxonomy

SGMcCalla · January 30, 2017, 6:25pm

Are there plans to incorporate brocc.py (https://github.com/kylebittinger/brocc) into Qiime2 as a plugin? brocc parses BLAST output (in place of assign_taxonomy) and is the only missing element that would make Qiime compatible with non-microbial datasets (when greengenes and SILVA databases are not appropriate for the taxa of interest). I am analyzing sequence data for fish barcodes (16S, 18S, CO1, CYTB) and the Qiime analyses and figures are just tantalizingly out of reach.

Nicholas_Bokulich · January 30, 2017, 7:11pm

Hi Sunnie,
I cannot answer about plans for including brocc in Qiime2, but I can point you to the docs for training q2-feature-classifier on other databases. The tutorial is located here and should be trainable on other marker-gene databases, supporting the needs that you describe.

The other option is to perform your classifications outside of Qiime2, then import back into Qiime2 to add to your feature data artifact. This can be performed with something like the following:

qiime tools export rep-seqs.qza --output-dir rep-seqs/
assign_taxonomy.py -i rep-seqs/dna-sequences.fna -o q1-taxonomy/
qiime tools import --type "FeatureData[Taxonomy]" --input-path q1-taxonomy/dna-sequences_tax_assignments.txt --output-path taxonomy.qza

(I am giving the taxonomy assignment using assign_taxonomy.py as an example since I am unfamiliar with brocc, but just substitute with your commands)

I hope that helps,

Nick

SGMcCalla · January 30, 2017, 10:18pm

Hi @Nicholas_Bokulich, Thank you very much for your help. Due to the nature of the datasets I am working with, training q2-feature-classifier on other databases would be very time and resource intensive. Are there other classifiers other than brocc that you would you reccomend?

Nicholas_Bokulich · January 30, 2017, 11:06pm

The default method in Qiime1’s assign_taxonomy.py is a uclust-based classifier that performs a consensus assignment similar to what brocc does (as far as I can tell) — so this could be worth trying. Qiime2 currently does not have any other taxonomy classifiers.

If brocc is your tool of choice, I’d recommend just doing the export/assign/import approach that I described previously. Sounds like brocc outputs results in a Qiime-like format, which should be straightforward to import to Qiime2.

BenKaehler · January 31, 2017, 3:18pm

Hi Sunny,

Thanks @Nicholas_Bokulich. I’m trying to develop the q2-feature-classifier to handle large data sets. How big are yours? Is there a potential that you might please be willing to share one for testing purposes?

Also, I agree that a BLAST-based assigner might be useful for those who prefer it, or for the purpose of comparison. Such a plugin is not currently planned (see here). Including it would require a developer committed to the purpose.

I note that brocc is GPL-licensed, which makes it less likely to be used by Qiime2, but writing a BLAST wrapper would not be difficult if one had the time and the inclination.

ebolyen · January 31, 2017, 4:31pm

We aren't legal experts, but our current assumption is that you could create a QIIME 2 plugin also licensed under GPL if that was a requirement.

SGMcCalla · February 1, 2017, 11:30pm

Thank you @BenKaehler. Yes, I would be very willing to share some of our data. I will PM you my email.

I have searched widely throughout the available bioinformatics programs and have only come across…maybe one or two programs for incorporating BLAST output into a biom-compatible format (and if I am missing any obvious/new programs, please let me know). And those few programs don’t have the powerful infrastructure to continue on with other analyses the way Qiime does. Maybe the novelty and the potential use for non-microbial researchers might add an extra incentive for a developer to create such a tool?

Thanks everyone for their input. I am open to any further suggestions.

Joseph_Sevigny · February 17, 2017, 9:20pm

I have been working at a similar problem. I will primarily be using qiime2 to work with meiofauna (microscopic benthic eukaryotes) so am constructing a classifier using the SILVA 18S database. I did not have much luck using the extract reads feature classifier (see command below), I am thinking this is because the primer sequences from the Earth Mircobiome Protocol are often not even in the SILVA references (something like less than %50 contain the exact sequence). I am not sure the variability that qiime2 allows when trimming this references but you will notice no degenerate codes in the primer sequences…

Right now I am trying to construct classifier without extracting the portion of the reads (I know this will be a limitation but it won’t hurt to try). In addition, I have completed taxonomy classification with a custom taxonomy assignment script using BLAST and imported it into Qiime2 with good results. Happy to answer questions or share my classifiers when I get them working.

-Cheers

qiime feature-classifier extract-reads \
  --i-sequences 18S_99_otus_silva111.qza \
  --p-f-primer GTACACACCGCCCGTC \
  --p-r-primer TGATCCTTCTGCAGGTTCACCTAC \
  --p-length 150 \
  --o-reads ref-seqs_classifier_18S_99_SILVA111_RL150.qza

jairideout · February 17, 2017, 11:28pm

Hi @Joseph_Sevigny! That’d be great if you shared your classifiers when they’re ready – there’s a lot of work being put into feature classification right now.

We’re happy to help you with training a Silva classifier if you’re still interested. Can you create a topic in #user-support about that specifically?

Joseph_Sevigny · February 20, 2017, 4:17pm

Hi @jairideout. Yes, I would be happy to share my classifiers. I have constructed one for the SILVA123 18S release using 99 OTUs, one for SILVA111 18S with the 99 aligned OTUs, and am working on one using the SILVA111 aligned 18S trimmed using the EMB primers. I’ll add a topic on #user-support for the latter classifier.

What is the best way to share the classifiers with everyone?

ebolyen · February 20, 2017, 4:56pm

We don't have an amazing answer to that, but you would probably just send us the classifier and we would rehost it (so that you don't have to foot the bandwidth).

Nicholas_Bokulich · March 21, 2017, 3:15pm

@SGMcCalla I recalled this old forum post and thought I would mention some new updates to q2-feature-classifier that may be useful to you.

BLAST+ and vsearch are newly implemented in q2-feature-classifier. Both perform alignment-based classification to maxaccepts top hits for each query sequence, then assign a consensus taxonomy to the query sequence from among these top hits.

For now, you will need to install a development version of QIIME2 and q2-feature-classifier to try out these new methods — otherwise, these will be available in the next release of QIIME2 (others can provide an ETA on the next regular release date, but I believe it may be soon).

gregcaporaso · March 21, 2017, 5:30pm

This will most likely be at the end of March or beginning of April.

hshcao · July 31, 2018, 12:53pm

Hi I am writing to second this request of implementing uclust/usearch61 or usearch OTU picking and taxonomy assignment.

HC