Train PR2 Classifier

I don’t know if I can bring this up here, I’m new using qiime environment and working with taxonomic data.

I am working with 18S data and trying to train my classifier using PR2 database and assign taxonomy to my data but experiencing difficulty. Here is the script I used

qiime tools import
--type 'FeatureData[Sequence]'
--input-path pr2_version_5.1.0_SSU_taxo_long.fasta
--output-path pr2-seqs.qza

qiime tools import
--type 'FeatureData[Taxonomy]'
--input-format HeaderlessTSVTaxonomyFormat
--input-path pr2_taxonomy_fixed.tsv
--output-path pr2-taxonomy.qza

qiime feature-classifier extract-reads
--i-sequences pr2-seqs.qza
--p-f-primer CAAACGATGACACCCATGAA
--p-r-primer CCCCCTGAGACTGTAACCTC
--p-trunc-len 0
--o-reads kine-ref-seqs.qza

qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads kine-ref-seqs.qza
--i-reference-taxonomy pr2-taxonomy.qza
--o-classifier kine-pr2-classifier.qza

The error message

If I may ask what database is best to use to train 16S (bacteria), 18S (Apicomplexa and Kinetoplastida) and 28S (nematoda) classifiers?

Thanks

Hi @Victoria92,

I'd suggest using qiime rescript get-pr2-data ... to fetch the PR2 files. This will download and parse and import the version 5.0.0 files for you. Then you can proceed as you've outlined. The upcoming QIIME 2 release should allow you to pull version 5.1.0.

Otherwise using SILVA would work too as it contains both 16S and 18S rRNA genes (SSU). You can also fetch the 23S and 28S (LSU) sequences too, check out the qiime rescript get-silva-data ... help text.

-Mike

1 Like

Thank you, I will try that and post the update

1 Like

Update on the code as promised.

I eventually used SILVA to classify the 18S database, it worked perfectly.

Thanks

2 Likes