How to train the classifier with multiple mixed forward primers?

SoilRotifer · February 1, 2023, 2:41pm

It appears that all of these primers bind to the same location, and only differ by a few bases. You could combine these 4 sequences into a pseudo-sequence using the IUPAC ambiguity codes like this:

An extreme case would result in something like this:
GRRTTYGATYMTGGYTYAG
^^Warning: This might be too ambiguous and lead to spurious hits.

Since we can allow for a certain amount of mis-matches lets try something like you suggested by slightly lowering the identity, or make a new sequence string, (see below). I retained the initial ambiguous IUPAC bases added additional ones where the common base had a stronger bond, (i.e. a G or a C).
GARTTTGATYMTGGCTYAG
^^This still might be too ambiguous, but you get the idea

Another option, which I'd recomend, is to use only one of the primer sets. Specifically, the one that uses 28F-YM primer and use the resulting extracted sequences as a reference pool for guiding the extraction of this region without the use of additional primer pairs. That is, follow the approach outlined here.

-Cheers!
-Mike