Issues with training classifier to PR2 database

I’m having trouble training a classifier to the 18S PR2 database. I’ve followed the tutorial but when I test the classifier on sequences pulled from the PR2 database it hasn’t been classifying them properly, some only classifying to kingdom.

I downloaded and imported the most recent version of the 18S PR2 database taxonomy and fasta.

qiime tools import
–type ‘FeatureData[Sequence]’
–input-path pr2_version_4.12.0_18S_mothur.fasta
–output-path pr2_4.12.0_fasta.qza

qiime tools import
–type ‘FeatureData[Taxonomy]’
–input-format HeaderlessTSVTaxonomyFormat
–input-path pr2_version_4.12.0_18S_mothur.tax
–output-path pr2_version_4.12.0_tax.qza

This is my code for trimming the reads and training the classifier.

qiime feature-classifier extract-reads
–i-sequences pr2_4.12.0_fasta.qza
–p-f-primer GTGCCAGCAGCCGCG
–p-r-primer TTTAAGTTTCAGCCTTGCG
–o-reads Pr2.4.12-ref-seqs.qza

qiime feature-classifier fit-classifier-naive-bayes
–i-reference-reads Pr2.4.12-ref-seqs.qza
–i-reference-taxonomy pr2_version_4.12.0_tax.qza
–o-classifier PR2-classifier.qza

When I test this classifier against sequences of the family Malacostraca pulled from the PR2 database. These either get classified incorrectly or only to Kingdom.

I tried using the qiime artifacts that were uploaded on the forum for the PR2 but I’m still having the same issue and it doesn’t say what version of PR2 these files are based on.

Thank you!

Welcome to the forum, @rosies!

Are you also trimming those sequences with extract-reads as you have shown? If not, it is not surprising that these are unclassified or misclassified, since you are training the classifier using the “wrong” training data. The query sequences must be trimmed in the same way as the reference (or trimmed to an internal site, e.g., you can use a full-length 18S classifier to classify trimmed 18S reads).