How can I determine whether the classifier I trained functions well?

minjoy · December 21, 2023, 4:51am

I trained the classifier using greengenes2 database with used primer (341F, 785R for V3-V4 region). It only took 1-2 hours (felt like super fast). Many people used to mention that this process took several hours, but not in my case. While it's advantageous for me to complete it quickly, I'm wondering whether my classifier works well or not.
Therefore, I tested the classifier following the Qiime2 tutorial (Training feature classifiers with q2-feature-classifier) using the representative sequences from the Moving Pictures tutorial. And then I compared my taxonomy.qzv file with the one in the tutorial. However, I'm not sure about how to assess whether my classifier is performing adequately. When I viewed the file using Qiime2, it appeared to be similar, but how can I verify this? Should I be concerned about minor differences, or are they not that important?

Just in case, I attach my code below.

#Extract reference reads
qiime feature-classifier extract-reads
--i-sequences 2022.10.backbone.full-length.fna.qza
--p-f-primer CCTAYGGGRBGCASCAG
--p-r-primer GGACTACNNGGGTATCTAAT
--o-reads Greengenes2_ref_seqs.qza

#Train the classifier
qiime feature-classifier fit-classifier-naive-bayes\
--i-reference-reads Greengenes2_ref_seqs.qza \
--i-reference-taxonomy 2022.10.backbone.tax.qza
--o-classifier Greengenes2-classifier.qza

#Test the classifier
qiime feature-classifier classify-sklearn
--i-classifier Greengenes2-classifier
--i-reads rep-seqs.qza
--o-classification taxonomy.qza

qiime metadata tabulate
--m-input-file taxonomy.qza
--o-visualization taxonomy.qzv

Thankyou for helping me. Have a nice day!

Nicholas_Bokulich · December 21, 2023, 7:08am

Hi @minjoy ,
This is a good question, but one that does not have a simple answer. There are many ways to test, e.g., using simulated datasets or real datasets with a known composition (e.g., from a mock community). I would recommend using a mock community — you can see some discussion and resources for using mock communities here:

and an old repository of mock community datasets that you can use for the purpose here:

This is not a short process: you will need to pick a mock community, process the data with QIIME 2, and then evaluate the accuracy of your classifier (see the thread linked above for relevant discussion and methods). But once you have a working mock community you can use it for testing any classifiers that you make.

Good luck!

system · January 21, 2024, 1:09pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.