Thanks for the closure, @yoshiki.
I would guess that the reason that all the reads went to the first worker is that chunk_size
is set to be large enough to effectively turn off chunking. To get some sort of reasonable parallelism it would probably be better to set chunk_size
to be, maybe, 1,000.
I’ve added some parameter documentation to classify-sklearn
. This is what it looks like now:
$ qiime feature-classifier classify-sklearn --help
Usage: qiime feature-classifier classify-sklearn [OPTIONS]
Classify reads by taxon using a fitted classifier.
Options:
--i-reads PATH Artifact: FeatureData[Sequence] [required]
The feature data to be classified.
--i-classifier PATH Artifact: TaxonomicClassifier [required]
The taxonomic classifier for classifying the
reads.
--p-chunk-size INTEGER [default: 262144]
Number of reads to process
in each batch.
--p-n-jobs INTEGER [default: 1]
The maximum number of
concurrently worker processes. If -1 all
CPUs are used. If 1 is given, no parallel
computing code is used at all, which is
useful for debugging. For n_jobs below -1,
(n_cpus + 1 + n_jobs) are used. Thus for
n_jobs = -2, all CPUs but one are used.
--p-pre-dispatch TEXT [default: 2*n_jobs]
"all" or expression, as
in "3*n_jobs". The number of batches (of
tasks) to be pre-dispatched.
--p-confidence FLOAT [default: 0.7]
Confidence threshold for
limiting taxonomic depth. Provide -1 to
disable confidence calculation, or 0 to
calculate confidence but not apply it to
limit the taxonomic depth of the
assignments.
--p-read-orientation [same|reverse-complement]
[optional]
Direction of reads with respect
to reference sequences. same will cause
reads to be classified unchanged; reverse-
complement will cause reads to be reversed
and complemented prior to classification.
Default is to autodetect based on the
confidence estimates for the first 100
reads.
--o-classification PATH Artifact: FeatureData[Taxonomy] [required
if not passing --output-dir]
--output-dir DIRECTORY Output unspecified results to a directory
--cmd-config PATH Use config file for command options
--verbose Display verbose output to stdout and/or
stderr during execution of this action.
[default: False]
--quiet Silence output if execution is successful
(silence is golden). [default: False]
--help Show this message and exit.