Hello,
I am trying to make some classifiers based on the most recent GreenGenes2 database. I have done this successfully for one data set, but when I try with a different dataset with slightly different primers I get an error. I've looked through the code carefully but can't figure out what's going wrong.
Code that works:
qiime feature-classifier extract-reads \
--i-sequences 2024.09.backbone.full-length.fna.qza \
--p-f-primer GTGYCAGCMGCCGCGGTAA \
--p-r-primer GGACTACNVGGGTWTCTAAT \
--o-reads 2024.09-GG2-ref-seqs-v4-2023.qza
nohup qiime feature-classifier fit-classifier-naive-bayes \
--i-reference-reads GG2-ref-seqs-v4.qza \
--i-reference-taxonomy 2024.09.backbone.tax.qza \
--o-classifier 2024.09.GG2-V4-uniform-classifier.qza &
nohup qiime clawback assemble-weights-from-Qiita \
--i-classifier 2024.09.GG2-V4-uniform-classifier.qza \
--i-reference-taxonomy 2024.09.backbone.tax.qza \
--i-reference-sequences GG2-ref-seqs-v4.qza \
--p-metadata-key empo_3 \
--p-metadata-value "Soil (non-saline)" \
--p-context Deblur_2021.09-Illumina-16S-V4-150nt-ac8c0b \
--o-class-weight 2024.09.GG2-V4-soil-nonsaline-weights.qza &
nohup qiime feature-classifier fit-classifier-naive-bayes \
--i-reference-reads GG2-ref-seqs-v4.qza \
--i-reference-taxonomy 2024.09.backbone.tax.qza \
--i-class-weight 2024.09.GG2-V4-soil-nonsaline-weights.qza \
--o-classifier 2024.09.GG2-V4-soil-nonsaline-weights-classifier-2023.qza &
Code that doesn't work:
In 2019 the primers used were:
# custom515F forward primer GTGCCAGCMGCCGCGGTAA
# custom801R reverse primer ACHVGGGTWTCTAATCCK
qiime feature-classifier extract-reads \
--i-sequences 2024.09.backbone.full-length.fna.qza \
--p-f-primer GTGCCAGCMGCCGCGGTAA \
--p-r-primer ACHVGGGTWTCTAATCCK \
--o-reads 2024.09-GG2-ref-seqs-v4-2019.qza
nohup qiime feature-classifier fit-classifier-naive-bayes \
--i-reference-reads 2024.09-GG2-ref-seqs-v4-2019.qza \
--i-reference-taxonomy 2024.09.backbone.tax.qza \
--i-class-weight 2024.09.GG2-V4-soil-nonsaline-weights.qza \
--o-classifier 2024.09.GG2-V4-soil-nonsaline-weights-classifier-2019.qza &
Error message:
Plugin error from feature-classifier:
Number of priors must match number of classes.
Debug info has been saved to /tmp/qiime2-q2cli-err-e5nuvotw.log
smithem3@lovelace:/mounts/lovelace/16S/Solo-2023-sandbox/16S_classifiers/GreenGenes2_Classifiers$ cat /tmp/qiime2-q2cli-err-e5nuvotw.log
/mounts/lovelace/software/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/q2_feature_classifier/classifier.py:104: UserWarning: The TaxonomicClassifier artifact that results from this method was trained using scikit-learn version 0.24.1. It cannot be used with other versions of scikit-learn. (While the classifier may complete successfully, the results will be unreliable.)
warnings.warn(warning, UserWarning)
Traceback (most recent call last):
File "/mounts/lovelace/software/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/q2cli/commands.py", line 520, in __call__
results = self._execute_action(
File "/mounts/lovelace/software/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/q2cli/commands.py", line 581, in _execute_action
results = action(**arguments)
File "<decorator-gen-40>", line 2, in fit_classifier_naive_bayes
File "/mounts/lovelace/software/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 342, in bound_callable
outputs = self._callable_executor_(
File "/mounts/lovelace/software/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 566, in _callable_executor_
output_views = self._callable(**view_args)
File "/mounts/lovelace/software/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/q2_feature_classifier/classifier.py", line 339, in generic_fitter
pipeline = fit_pipeline(reference_reads, reference_taxonomy,
File "/mounts/lovelace/software/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/q2_feature_classifier/_skl.py", line 76, in fit_pipeline
pipeline.fit(X, y)
File "/mounts/lovelace/software/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/sklearn/pipeline.py", line 346, in fit
self._final_estimator.fit(Xt, y, **fit_params_last_step)
File "/mounts/lovelace/software/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/q2_feature_classifier/custom.py", line 40, in fit
self.partial_fit(cX, cy, sample_weight=csample_weight,
File "/mounts/lovelace/software/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/sklearn/naive_bayes.py", line 590, in partial_fit
self._update_class_log_prior(class_prior=class_prior)
File "/mounts/lovelace/software/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/sklearn/naive_bayes.py", line 483, in _update_class_log_prior
raise ValueError("Number of priors must match number of"
ValueError: Number of priors must match number of classes.
Any help/advice? Thanks!