I noticed on the linked forum post you brought up that the solution suggested assumes that sequence ID's are solely numerical. I have run into a similar problem, but have IDs for the sequences that are both numerical and alphabetical.
How can I circumvent this? And what are the character requirements for using the classifier?
I'm not sure the script provided aids me in the problem I am encountering, though I do think I might have miscommunicated. I am not getting the ValueError from any of my sequences, or at least, from what I could gather.
However, if I have understood the tutorial, it is just meant to remove the presence of lowercase letters from the sequence information.
What I was thinking to do is change all characters not in the list of acceptable characters to their lowercase counterpart with respect to the sequence ID. I am uncertain if this would be sufficient to address the error, and if it would allow me to continue onwards and train the classifier.
Yes it looks like you are getting a distinct error message from the topic you linked to. The issue in that topic was lowercase characters in the sequence. Your issue is an invalid character "I"
Those sequence IDs contain "1" (one) characters, not "I" (eye) characters.
The sequence IDs are not relevant here — do not attempt to modify these, it will not fix your problem!
Instead look for "I" characters and convert these, or maybe remove that sequence — do you have amino acid sequences in your file? Because "I" is not a degenerate nucleotide base as far as I know.
Note: you also have gaps in your sequence(s), and apparently spaces between accessions? These gaps and spaces should be removed or they will likely cause other problems.