Sidle -- extract reads with noncanonical nucleotide

jlli2000 · October 18, 2022, 10:07pm

Hello, Justin:

I installed slide and rescript to Qiime.

I tried to use slide to create a seqs,qza with my COI primers. I used Devon O'rourke's bold_anml_seqs.qza as input.

My primer set are:

f-primer GGTCAACAAATCATAAAGATATTGG
r-primer CTTATRTTRTTTATICGIGGRAAIGC

Since the slide does not recognize I, I replace it with N.

--p-f-primer GGTCAACAAATCATAAAGATATTGG
--p-r-primer CTTATRTTRTTTATNCGNGGRAANGC

and run the following command per slide tutorial:

$ qiime feature-classifier extract-reads
quences > --i-sequences slide3-bold_anml_seqs.qza
--p-f-primer GGTCAACAAATCATAAAGATATTGG
--p-r-primer CTTATRTTRTTTATNCGNGGRAANGC
--o-reads slide3-bold_anml_seq-CO1-F230.qza

Plugin error from feature-classifier:

No matches found

Debug info has been saved to /tmp/qiime2-q2cli-err-l1y4_oa1.log
(qiime2-2022.8) main@BioInfo-1:~/Folder1/bold_database$

I have two questions:

Is it correct to replace base I with base N in the primer sequences
Why there is no matches? Do I have too many degenerated bases in primers? How can I solve the problem?

Thanks,

Jin

jwdebelius · October 19, 2022, 1:55pm

Hi @jlli2000,

I shared this so everyone can see it. I is not a canonical nucleotide, and as far as I know q2-feature-classifier may not be set up to handle it. (Although you won't be able to build a tree, that step in Sidle won't be able to handle the I.)

There's a conversation about this issue here:

you might also check the BOLD database that's described here that will give you more ideas.

I'm sorry I dont have any COI experience, so I'm not sure about this particular issue.

Best,
Justine

SoilRotifer · October 19, 2022, 3:51pm

To extend @jwdebelius's response, in particular to the I nucleotide.

Yes! In fact you can read one of my old posts here:

Regarding:

I do not think you have too many degeneracies. If anything, you'd be more likely to run the risk of extracting more spurious sequences than extracting less sequences. Given that these are the same primers as reviewed in Porter et al., from Gibson et al., Folmer et al., I'd think they are fine.

Alternatively, many researchers remove primer sequences prior to submitting to online repositories. For example, if they amplified and sequenced the same region as you, then they most likely removed the primer sequences before doing so. Which is a good thing. However, this means that PCR primer searches will not work to extract a given region.

You can try the extract-seq-segments approach of RESCRIPt to extract your amplicon region from sequences that might be missing the primer sequences.