Fit-classifier-sklearn parameter

devonorourke · October 31, 2018, 2:06pm

This is really just a follow up to a post that @BenKaehler already addressed in an earlier question. I'm interested in comparing the taxonomy assignments between the blast, vsearch, and some other classifier that I'd like to build with QIIME's fit-classifier-sklearn plugin. I'm using a COI database that I've filtered myself - it's working great for the vsearch and blast tools so far.
The fit-classifier-sklearn documentation mentions a few options:

Options:
  --i-reference-reads ARTIFACT PATH FeatureData[Sequence]
                                  [required]
  --i-reference-taxonomy ARTIFACT PATH FeatureData[Taxonomy]
                                  [required]
  --p-classifier-specification TEXT
                                  [required]
  --i-class-weight ARTIFACT PATH FeatureTable[RelativeFrequency]
                                  [optional]
  --o-classifier ARTIFACT PATH TaxonomicClassifier

It's clear to me what to enter with --i-reference-reads and --i-reference-taxonomy. It's not clear to me how to:

Generate the text for --p-classifier-specification. In Ben's example, is this the entirety of the text he shared? I would guess that it's simply the print statement following the last line in his embedded code (everything after):

In [15]: print(classifier_specification)

I'm still unclear what to do with the --i-class-weight and --o-classifier terms too. Appreciate any input on what to enter there.

For anyone who routinely uses scikit-learn programs, I'm sure this site has everything you need. As someone who has never used this program, I was wondering if those power users can point to some specific documentation within that site to get my novice feet wet. Just start at chapter 1?

Thanks!

Nicholas_Bokulich · October 31, 2018, 8:34pm

I would recommend just using fit-classifier-naive-bayes — the specification is already set for that method, so you can start using that classifier immediately on your data.

Yep, that's the correct resource. There is no way to "get your feet wet" with specification of a new classifier — you really need to have a good deal of familiarity with the particular estimator to know what is pertinent and get this working. Starting with chapter 1 and reading on through will give you a pretty good idea of where to start.

Hence the lack of documentation — that particular method is sort of exposed for development purposes, but we don't have good guidelines for users. For everyone else, there's fit-classifier-naive-bayes.

I hope that helps!

BenKaehler · November 1, 2018, 8:14pm

In addition to @Nicholas_Bokulich's excellent reply, in answer to your questions regarding --o-classifier and --i-class-weight, the q2-feature-classifier and q2-clawback tutorials may be useful if you are training your own classifiers. The latter explains what --i-class-weight is for, but it is probably of academic interest only if you're just getting started building data sets for a custom reference data set.

system · December 3, 2018, 2:14am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.