EukBR: CAAGCAGAAGACGGCATACGAGAT XXXXXXXXXXXX AGTCAGTCAG CA TGATCCTTCTGCAGGTTCACCTAC
This is a general question that I’ll be using the answer for training 18S and CO1 classifers in the future, but I’ll narrow it down to using the already prepared file:
silva-138.0-ssu-nr99-seqs-derep-uniq.qza
How do I trim to the 18S region? Which part of the primer do I use?
Yes, you only need to provide the actual PCR primer portion of the sequencing primer. Assuming you are using a standard sequencing protocol. That is, your sequencing data will contain the PCR primer at the 5' end of the sequence.
I'd recommend curating your own SILVA data rather than using the pre-made file. You can keep it simple, and use RESCRIPt to simply fetch the data... then perform very simple curation using the cull-seqs and dereplicate actions. For the full length data.
For ampicon region-specific classifiers I recommend this simple approach:
For the CO1 data... in addition to the other tutorials on fetching CO1 sequences from GenBank or BOLD, (or whatever other tool you use) you can also try out qiime rescript get-midori2-data --p-mito-gene 'CO1' ... to fetch CO1 reference data too. I've not written a tutorial for this, but it'd likely follow the similar approaches as outlined in the other tutorials.