I’m having the same issue as Extract reference reads and How necessary is feature-classifier extract-reads?. The command below took 8 seconds to trim three sequences. I’m relieved to hear there is a workaround to simply skip this step, but I just wanted to add my feedback. My amplicon locus is eukaryotic COI.
time qiime feature-classifier extract-reads \
--i-sequences Three_Seqs_COI_seqs.qza \
--p-f-primer CCDGAYATRGCDTTYCCDCG \
--p-r-primer GTRATDGCDCCDGCDARDAC \
--o-reads Three_Seqs_COI_seqs_trimmed_BE.qza
Saved FeatureData[Sequence] to: Three_Seqs_COI_seqs_trimmed_BE.qza
real 0m7.879s
user 0m5.425s
sys 0m0.858s
Much of that time includes the time to read/write sequences, and this would not scale linearly, so is not really a valuable estimate. But indeed extract-reads can be slow — particularly for COI we get this complaint (perhaps because the raw reference reads are longer?)
And indeed, trimming is only recommended but not required. We see a small accuracy improvement for 16S, but none for ITS. I have not tested COI but don't expect dramatically different results. Of course, even though extract-reads may be time consuming now, using trimmed reference sequences can speed up downstream steps, particularly if you are using an alignment-based classifier.
Thanks @Nicholas_Bokulich! I wonder if the slowdown with COI is due to the high degeneracy of the primer sequences. Another user raised this possibility before, and I think it makes sense, as the COI primers are much more degenerate than the 16S and 18S primers where extract-reads runtimes were much shorter.