HI @jwdebelius, @Mehrbod_Estaki,
extract-reads
handles degenerates correctly. The local aligner from skbio that we use has some … interesting behaviour regarding degenerates, but from memory we worked hard to work around them.
The answer to your question is in —p-identity
:
--p-identity NUMBER minimum combined primer match identity threshold.
[default: 0.8]
So the mismatch threshold is defined as a fraction of the combined lengths of both primers, and is applied to the mismatches accumulated across the primers.
I am not saying that there is a biological justification for that behaviour, but that is what it is.
I have been intending for some time to revisit this method. Last time I tried my favourite in-silico PCR simulator was ipcress. Who knows, this summer I may get around to implementing a wrapper for it.
Cheers,
Ben