[Rpm-metadata] unicode conversion
Joe Shaw
joe at ximian.com
Mon Oct 27 17:53:44 UTC 2003
On Sat, 2003-10-25 at 02:04, seth vidal wrote:
> hi all,
> this is what I did, comments?
>
> def utf8String(string):
> """hands back a unicoded string"""
> try:
> string = unicode(string)
> except UnicodeError, e:
> newstring = ''
> for char in string:
> if ord(char) > 127:
> newstring = newstring + '#'
> else:
> newstring = newstring + char
> newstring = newstring + 'NOTE: Characters replaced outside of UTF8 Range'
> return unicode(newstring)
> else:
> return string
I'd suggest this instead:
try:
# this will validate UTF-8
string = unicode(string, 'utf-8', 'strict')
except UnicodeError, e:
# replaces invalid chars with '?'
string = unicode(string, 'iso-8859-1', 'replace')
return string
Joe
More information about the Rpm-metadata
mailing list