I’m having trouble training a classifier to the 18S PR2 database. I’ve followed the tutorial but when I test the classifier on sequences pulled from the PR2 database it hasn’t been classifying them properly, some only classifying to kingdom.
I downloaded and imported the most recent version of the 18S PR2 database taxonomy and fasta.
qiime tools import
–type ‘FeatureData[Sequence]’
–input-path pr2_version_4.12.0_18S_mothur.fasta
–output-path pr2_4.12.0_fasta.qza
qiime tools import
–type ‘FeatureData[Taxonomy]’
–input-format HeaderlessTSVTaxonomyFormat
–input-path pr2_version_4.12.0_18S_mothur.tax
–output-path pr2_version_4.12.0_tax.qza
This is my code for trimming the reads and training the classifier.
qiime feature-classifier extract-reads
–i-sequences pr2_4.12.0_fasta.qza
–p-f-primer GTGCCAGCAGCCGCG
–p-r-primer TTTAAGTTTCAGCCTTGCG
–o-reads Pr2.4.12-ref-seqs.qza
qiime feature-classifier fit-classifier-naive-bayes
–i-reference-reads Pr2.4.12-ref-seqs.qza
–i-reference-taxonomy pr2_version_4.12.0_tax.qza
–o-classifier PR2-classifier.qza
When I test this classifier against sequences of the family Malacostraca pulled from the PR2 database. These either get classified incorrectly or only to Kingdom.
I tried using the qiime artifacts that were uploaded on the forum for the PR2 but I’m still having the same issue and it doesn’t say what version of PR2 these files are based on.
Thank you!