Training a classifer for 18S and CO1 edna

zu_mu · September 20, 2025, 5:16am

Hello I’m using the v9 18S primers so F and R are:

1391F: AATGATACGGCGACCACCGAGATCTACAC TATCGCCGTT CG GTACACACCGCCCGTC

EukBR: CAAGCAGAAGACGGCATACGAGAT XXXXXXXXXXXX AGTCAGTCAG CA TGATCCTTCTGCAGGTTCACCTAC

This is a general question that I’ll be using the answer for training 18S and CO1 classifers in the future, but I’ll narrow it down to using the already prepared file:

silva-138.0-ssu-nr99-seqs-derep-uniq.qza

How do I trim to the 18S region? Which part of the primer do I use?

qiime feature-classifier extract-reads \ --i-sequences silva-138.0-ssu-nr99-seqs-derep-uniq.qza \ --p-f-primer GTACACACCGCCCGTC \ --p-r-primer TGATCCTTCTGCAGGTTCACCTAC \ --p-n-jobs 8 \ --p-read-orientation 'forward' \ --o-reads silva-132.0-ssu-nr99-seqs-F04-R22.qza

That’s what I used initially but that’s only the last part of the primer. My ASVs and taxonomy had a lot bacterial assignment.

SoilRotifer · September 20, 2025, 3:22pm

Hi @zu_mu,

Yes, you only need to provide the actual PCR primer portion of the sequencing primer. Assuming you are using a standard sequencing protocol. That is, your sequencing data will contain the PCR primer at the 5' end of the sequence.

I'd recommend curating your own SILVA data rather than using the pre-made file. You can keep it simple, and use RESCRIPt to simply fetch the data... then perform very simple curation using the cull-seqs and dereplicate actions. For the full length data.

For ampicon region-specific classifiers I recommend this simple approach:

1.qiime rescript get-silva-data ...
2. qiime rescript reverse-transcribe ...
3. qiime rescript dereplicate ...
4. qiime feature-classifier extract-reads ...
5. qiime rescript cull-seqs ...
6. qiime rescript dereplicate ...
7. qiime feature-classifier fit-classifier-naive-bayes ...

For the CO1 data... in addition to the other tutorials on fetching CO1 sequences from GenBank or BOLD, (or whatever other tool you use) you can also try out qiime rescript get-midori2-data --p-mito-gene 'CO1' ... to fetch CO1 reference data too. I've not written a tutorial for this, but it'd likely follow the similar approaches as outlined in the other tutorials.

-Cheers!

zu_mu · September 21, 2025, 5:19am

Thank you so much! Sorry I didn’t clarify, the “pre-made” files was just me following those steps to curate the SILVA database.