This is really just a follow up to a post that @BenKaehler already addressed in an earlier question. I'm interested in comparing the taxonomy assignments between the blast, vsearch, and some other classifier that I'd like to build with QIIME's fit-classifier-sklearn plugin. I'm using a COI database that I've filtered myself - it's working great for the vsearch and blast tools so far.
The fit-classifier-sklearn documentation mentions a few options:
Options:
--i-reference-reads ARTIFACT PATH FeatureData[Sequence]
[required]
--i-reference-taxonomy ARTIFACT PATH FeatureData[Taxonomy]
[required]
--p-classifier-specification TEXT
[required]
--i-class-weight ARTIFACT PATH FeatureTable[RelativeFrequency]
[optional]
--o-classifier ARTIFACT PATH TaxonomicClassifier
It's clear to me what to enter with --i-reference-reads
and --i-reference-taxonomy
. It's not clear to me how to:
- Generate the text for
--p-classifier-specification
. In Ben's example, is this the entirety of the text he shared? I would guess that it's simply the print statement following the last line in his embedded code (everything after):
In [15]: print(classifier_specification)
- I'm still unclear what to do with the
--i-class-weight
and--o-classifier
terms too. Appreciate any input on what to enter there.
For anyone who routinely uses scikit-learn
programs, I'm sure this site has everything you need. As someone who has never used this program, I was wondering if those power users can point to some specific documentation within that site to get my novice feet wet. Just start at chapter 1?
Thanks!