[Rpm-metadata] unicode conversion

Joe Shaw joe at ximian.com
Mon Oct 27 17:53:44 UTC 2003


On Sat, 2003-10-25 at 02:04, seth vidal wrote:
> hi all,
>   this is what I did, comments?
> 
> def utf8String(string):
>     """hands back a unicoded string"""
>     try:
>         string = unicode(string)
>     except UnicodeError, e:
>         newstring = ''
>         for char in string:
>             if ord(char) > 127:
>                 newstring = newstring + '#'
>             else:
>                 newstring = newstring + char
>         newstring = newstring + 'NOTE: Characters replaced outside of UTF8 Range'
>         return unicode(newstring)
>     else:
>         return string

I'd suggest this instead:

try:
    # this will validate UTF-8
    string = unicode(string, 'utf-8', 'strict')
except UnicodeError, e:
    # replaces invalid chars with '?'
    string = unicode(string, 'iso-8859-1', 'replace')

return string

Joe




More information about the Rpm-metadata mailing list