Rescript: Classifier does not support confidence values

alexkrohn · March 1, 2022, 2:46pm

I got this error using rescript evaluate-fit-classifier. I am using qiime 2022.2 and scikit-learn 0.24.1 on a Linux server. I am attempting to create a reference database of the Sylvilagus bachmani genome so I can blast sequences to it to determine if they are from S. bachmani or another organism. I can recreate this error by doing this:

wget https://data.qiime2.org/distro/core/qiime2-2022.2-py38-linux-conda.yml
conda env create -n qiime2-2022.2 --file qiime2-2022.2-py38-linux-conda.yml
conda activate qiime2-2022.2

pip install git+https://github.com/bokulich-lab/RESCRIPt.git

qiime rescript get-ncbi-data --p-query '(512907[BioProject]) AND "Sylvilagus bachmani"[porgn:__txid365149] ' \
--o-sequences bachmani-refseqs-unfiltered.qza \
--o-taxonomy bachmani-refseqs-taxonomy-unfiltered.qza \
--p-n-jobs 5

qiime rescript evaluate-fit-classifier  --i-sequences bachmani-refseqs-unfiltered.qza \
 --i-taxonomy bachmani-refseqs-taxonomy-unfiltered.qza  \
 --o-classifier bachmani-classifier.qza \
--o-observed-taxonomy bachmani-classifier-predicted-taxonomy.qza \
--verbose

Error info:

Validation: 134.04s
/home/tangled/miniconda3/envs/qiime/lib/python3.8/site-packages/q2_feature_classifier/classifier.py:102: UserWarning: The TaxonomicClassifier artifact that results from this method was trained using scikit-learn version 0.24.1. It cannot be used with other versions of scikit-learn. (While the classifier may complete successfully, the results will be unreliable.)
warnings.warn(warning, UserWarning)
Training: 2663.73s
Traceback (most recent call last):
  File "/home/tangled/miniconda3/envs/qiime/lib/python3.8/site-packages/q2cli/commands.py", line 339, in __call__
    results = action(**arguments)
  File "<decorator-gen-457>", line 2, in evaluate_fit_classifier
  File "/home/tangled/miniconda3/envs/qiime/lib/python3.8/site-packages/qiime2/sdk/action.py", line 245, in bound_callable
    outputs = self._callable_executor_(scope, callable_args,
  File "/home/tangled/miniconda3/envs/qiime/lib/python3.8/site-packages/qiime2/sdk/action.py", line 485, in _callable_executor_
    outputs = self._callable(scope.ctx, **view_args)
  File "/home/tangled/miniconda3/envs/qiime/lib/python3.8/site-packages/rescript/cross_validate.py", line 48, in evaluate_fit_classifie
r
    observed_taxonomy, = classify(reads=sequences,
  File "<decorator-gen-513>", line 2, in classify_sklearn
  File "/home/tangled/miniconda3/envs/qiime/lib/python3.8/site-packages/qiime2/sdk/action.py", line 245, in bound_callable
    outputs = self._callable_executor_(scope, callable_args,
  File "/home/tangled/miniconda3/envs/qiime/lib/python3.8/site-packages/qiime2/sdk/action.py", line 391, in _callable_executor_
    output_views = self._callable(**view_args)
  File "/home/tangled/miniconda3/envs/qiime/lib/python3.8/site-packages/q2_feature_classifier/classifier.py", line 220, in classify_skl
earn
    seq_ids, taxonomy, confidence = list(zip(*predictions))
  File "/home/tangled/miniconda3/envs/qiime/lib/python3.8/site-packages/q2_feature_classifier/_skl.py", line 46, in predict
    for calculated in workers(jobs):
  File "/home/tangled/miniconda3/envs/qiime/lib/python3.8/site-packages/joblib/parallel.py", line 1043, in __call__
    if self.dispatch_one_batch(iterator):
  File "/home/tangled/miniconda3/envs/qiime/lib/python3.8/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
    self._dispatch(tasks)
  File "/home/tangled/miniconda3/envs/qiime/lib/python3.8/site-packages/joblib/parallel.py", line 779, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/home/tangled/miniconda3/envs/qiime/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  File "/home/tangled/miniconda3/envs/qiime/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 572, in __init__
    self.results = batch()
  File "/home/tangled/miniconda3/envs/qiime/lib/python3.8/site-packages/joblib/parallel.py", line 262, in __call__
    return [func(*args, **kwargs)
  File "/home/tangled/miniconda3/envs/qiime/lib/python3.8/site-packages/joblib/parallel.py", line 262, in <listcomp>
    return [func(*args, **kwargs)
  File "/home/tangled/miniconda3/envs/qiime/lib/python3.8/site-packages/q2_feature_classifier/_skl.py", line 54, in _predict_chunk
    return _predict_chunk_with_conf(pipeline, separator, confidence, chunk)
  File "/home/tangled/miniconda3/envs/qiime/lib/python3.8/site-packages/q2_feature_classifier/_skl.py", line 70, in _predict_chunk_with
_conf
    raise ValueError('this classifier does not support confidence values')
ValueError: this classifier does not support confidence values

Plugin error from rescript:

  this classifier does not support confidence values

See above for debug info

Looking at previous posts, it seems like this may be due to the taxonomy in bachmani-refseqs-taxonomy-unfiltered.qza (even though all of the sequences should be from one organism). How can I check the taxonomy levels in a qza?

Thanks,

Alex

Nicholas_Bokulich · March 1, 2022, 3:34pm

Hi @alexkrohn ,
This error is coming from q2-feature-classifier at the classifier fitting stage of the pipeline.

tl;dr I think the issue is that you are using the wrong method for the intended task

This is often seen (with q2-feature-classifier) when the taxonomy has uneven ranks, but that should not be the case when using RESCRIPt, as one of the points of using it is to automatically standardize the taxonomic ranks.

I think the issue here is actually because you only have one organism.

The naive Bayes classifier that you are attempting to train here is meant to predict the taxonomic affiliation of a sequence, assuming that you have multiple different possible taxonomic labels. It is not intended for a binary classification task like what you are attempting (i.e., "does this sequence belong to the one reference group"). This method will not work if you only have one taxonomic label.

The fitted classifier output here is only used for taxonomic prediction with sklearn. It is not compatible with BLAST.

If you want to BLAST against the reference sequence(s), you should BLAST against the reference sequence. Check out the q2-quality-control plugin, specifically exclude-seqs, which wraps BLAST to align sequences against a set of reference sequences (and filter out any that do not align).

Good luck!

alexkrohn · March 1, 2022, 5:04pm

Thanks for your help @Nicholas_Bokulich. I'll check out exclude-seqs to see if that might solve my issue.

I was hoping to use QIIME instead of the standard BLAST techniques to save some time -- not being able to parallelize BLAST when searching >5 GB of sequences against a 3 Gb reference genome means that this simple "check" takes a really long time. Rescript was in my usual pipeline for metabarcoding, but I'll keep digging into other functions. Thanks again.

Nicholas_Bokulich · March 1, 2022, 5:23pm

exclude-seqs can also use VSEARCH instead of BLAST for alignment. This enables parallelization.

If you have whole genome reads, you are probably better off using something else at the moment — we are working on some genomics-related plugins for release later this year, but this is not fully functional yet so the advantages of using QIIME 2 are not fully captured at the moment...