Thanks @devonorourke,
Yeah I recall our last encounter with this. It still baffles me that a sequence repository would not follow standard IUPAC rules for sequence deposition. This issue crops up from time to time with others too.
It might be worth considering replacing I
s with N
s, with a replace_non_iupac_with_Ns
method. In the short term, users can try bioawk:
bioawk -c fastx 'gsub("I", "N") $seq {print ">" $name; print $seq}' < infile.fasta > outfile.fasta
-Mike