How do I use a pretrained classifier?

BioHazard_Dragon · August 22, 2023, 3:54am

Hi, I want to use a pretrained classifier but I am completely new to this and I don't really understand how to get the taxonomy file from a pretrained SILVA classifier. I have looked at the qiime and rescript tutorials but I have some questions about the process as what I have tried so far seems to kill the process on my pc.

There is no specific pretrained classifier for the 16s region I want. Do I download the full classifier and use that?
The full classifier is huge and kills my system when I run the sklearn step. Do I need to extract region-specific reads and dereplicate first or just make the chunks smaller when creating the taxonomy file (see code below).
Do I run the classify-sklearn step on the classifier and train it using on my own own dataset or a dummy dataset?

Do I run something like this to get the taxonomy file:
qiime feature-classifer classify-sklearn
--i-classifier pretrained-classifier.qza
--p-reads-per-batch 5000
--i-reads mydata_rep_seqs.qza
--o-classification taxonomy_silva.qza

Is there a less resource-hungry alternative to the classify-sklearn step to get the taxonomy file?

Thanks
B

colinbrislawn · August 22, 2023, 4:04am

Good evening!

Welcome to the forums!

You are on the right track and asking all the right questions! First, check out the RESCRIPt tutorial, which is the most complete overview of this process.

Both are good ideas, and RESCRIPt will help you do both

Yes, you can avoid the pretraining with a top-hit LCA classifier like classify-consensus-vsearch . All good options!

I'm afraid I've given you much to consider without answering each of your questions. If you have more questions about LCA classifiers or how to use RESCRIPt, let me know.

BioHazard_Dragon · August 22, 2023, 6:09am

Hi Colin,
thanks for answering so quickly. I've read the RESCRIPt and QIIME2 tutorials but I've some difficulties understanding exactly what I needed to do. I browsed the forum posts but I'm not sure I actually understand exactly what they're saying. Not up to speed yet but I'm working on it.

I do have some specific, basic questions about RESCRIPt:

When you run the first step to get the SILVA data, does the data you download have both Bacteria and Archaea in it, or should I be looking for a specific version of the SILVA database?
When you train the classifier, e.g. using classify-sklearn, do you use your own sequence data where it says '--i- reads' ?

Thanks for pointing me in the direction of classify-consensus-vsearch, I will definitely look into that option as an alternative.

B

colinbrislawn · August 22, 2023, 11:46pm

Yes, both Bacteria and Archaea are included in SILVA.
You use your own database data. This could be data from a new microbe you sequenced and assembled, or it could be stuff from an existing database like SILVA.

BioHazard_Dragon · September 2, 2023, 5:24am

Hi Colin
I ended up using classify-consensus-vsearch. It was a better option for me.
Thanks

B

Topic		Replies	Views
Silva 132 Database, which taxonomy & reference sequence files to select for classifier training? - part 2 User Support	8	306	June 4, 2023
How to train SILVA 128 feature classifiers Data resources feature-classifier	7	6227	March 26, 2021
Pre-trained silva classifier (V3-V4) qiime 2021.4 User Support	9	5095	September 23, 2021
Require Pre-trained Silva Classifier (V3-V4) General Discussion	7	1263	March 6, 2022
clarification/verification of taxonomy classifiers User Support	5	684	June 23, 2022

How do I use a pretrained classifier?

Related Topics