Strange error while making classifier

Hello,

I am trying to make some classifiers based on the most recent GreenGenes2 database. I have done this successfully for one data set, but when I try with a different dataset with slightly different primers I get an error. I've looked through the code carefully but can't figure out what's going wrong.

Code that works:

qiime feature-classifier extract-reads \
  --i-sequences 2024.09.backbone.full-length.fna.qza \
  --p-f-primer GTGYCAGCMGCCGCGGTAA \
  --p-r-primer GGACTACNVGGGTWTCTAAT \
  --o-reads 2024.09-GG2-ref-seqs-v4-2023.qza 

nohup qiime feature-classifier fit-classifier-naive-bayes \
  --i-reference-reads GG2-ref-seqs-v4.qza \
  --i-reference-taxonomy 2024.09.backbone.tax.qza \
  --o-classifier 2024.09.GG2-V4-uniform-classifier.qza &  

nohup qiime clawback assemble-weights-from-Qiita \
  --i-classifier 2024.09.GG2-V4-uniform-classifier.qza \
  --i-reference-taxonomy 2024.09.backbone.tax.qza \
  --i-reference-sequences GG2-ref-seqs-v4.qza \
  --p-metadata-key empo_3 \
  --p-metadata-value "Soil (non-saline)" \
  --p-context Deblur_2021.09-Illumina-16S-V4-150nt-ac8c0b \
  --o-class-weight 2024.09.GG2-V4-soil-nonsaline-weights.qza & 

nohup qiime feature-classifier fit-classifier-naive-bayes \
  --i-reference-reads GG2-ref-seqs-v4.qza  \
  --i-reference-taxonomy 2024.09.backbone.tax.qza \
  --i-class-weight 2024.09.GG2-V4-soil-nonsaline-weights.qza \
  --o-classifier 2024.09.GG2-V4-soil-nonsaline-weights-classifier-2023.qza &

Code that doesn't work:

In 2019 the primers used were:
# custom515F forward primer   GTGCCAGCMGCCGCGGTAA
# custom801R reverse primer   ACHVGGGTWTCTAATCCK

qiime feature-classifier extract-reads \
  --i-sequences 2024.09.backbone.full-length.fna.qza \
  --p-f-primer GTGCCAGCMGCCGCGGTAA \
  --p-r-primer ACHVGGGTWTCTAATCCK \
  --o-reads 2024.09-GG2-ref-seqs-v4-2019.qza 

nohup qiime feature-classifier fit-classifier-naive-bayes \
  --i-reference-reads 2024.09-GG2-ref-seqs-v4-2019.qza  \
  --i-reference-taxonomy 2024.09.backbone.tax.qza \
  --i-class-weight 2024.09.GG2-V4-soil-nonsaline-weights.qza \
  --o-classifier 2024.09.GG2-V4-soil-nonsaline-weights-classifier-2019.qza &

Error message:


Plugin error from feature-classifier:

  Number of priors must match number of classes.

Debug info has been saved to /tmp/qiime2-q2cli-err-e5nuvotw.log


smithem3@lovelace:/mounts/lovelace/16S/Solo-2023-sandbox/16S_classifiers/GreenGenes2_Classifiers$ cat /tmp/qiime2-q2cli-err-e5nuvotw.log
/mounts/lovelace/software/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/q2_feature_classifier/classifier.py:104: UserWarning: The TaxonomicClassifier artifact that results from this method was trained using scikit-learn version 0.24.1. It cannot be used with other versions of scikit-learn. (While the classifier may complete successfully, the results will be unreliable.)
  warnings.warn(warning, UserWarning)
Traceback (most recent call last):
  File "/mounts/lovelace/software/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/q2cli/commands.py", line 520, in __call__
    results = self._execute_action(
  File "/mounts/lovelace/software/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/q2cli/commands.py", line 581, in _execute_action
    results = action(**arguments)
  File "<decorator-gen-40>", line 2, in fit_classifier_naive_bayes
  File "/mounts/lovelace/software/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 342, in bound_callable
    outputs = self._callable_executor_(
  File "/mounts/lovelace/software/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 566, in _callable_executor_
    output_views = self._callable(**view_args)
  File "/mounts/lovelace/software/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/q2_feature_classifier/classifier.py", line 339, in generic_fitter
    pipeline = fit_pipeline(reference_reads, reference_taxonomy,
  File "/mounts/lovelace/software/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/q2_feature_classifier/_skl.py", line 76, in fit_pipeline
    pipeline.fit(X, y)
  File "/mounts/lovelace/software/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/sklearn/pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "/mounts/lovelace/software/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/q2_feature_classifier/custom.py", line 40, in fit
    self.partial_fit(cX, cy, sample_weight=csample_weight,
  File "/mounts/lovelace/software/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/sklearn/naive_bayes.py", line 590, in partial_fit
    self._update_class_log_prior(class_prior=class_prior)
  File "/mounts/lovelace/software/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/sklearn/naive_bayes.py", line 483, in _update_class_log_prior
    raise ValueError("Number of priors must match number of"
ValueError: Number of priors must match number of classes.

Any help/advice? Thanks!

1 Like

Hi @16sIceland ,
This error is basically saying that the taxonomic weights do not match the reference.

In the first example your weights are generated based on the reference sequences trimmed with the first primer set. So then training a classifier using those weights and those same sequences works.

In the second example you are taking the same weights but different reference sequences (trimmed with a different primer set, which could lead to different seqs/taxa being included in the final set due to different amplification biases from the first primer pair).

So if you have a different reference, you need taxonomic weights that match that reference.

I hope that helps!

2 Likes

Ok I think so: I need to go through all the steps with the new primers ... thanks!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.