[Rpm-metadata] Re: unicode conversion

Jeff Johnson n3npq at nc.rr.com
Sat Oct 25 13:24:45 UTC 2003


seth vidal wrote:

>On Sat, 2003-10-25 at 02:04, seth vidal wrote:
>  
>
>>hi all,
>>  this is what I did, comments?
>>
>>def utf8String(string):
>>    """hands back a unicoded string"""
>>    try:
>>        string = unicode(string)
>>    except UnicodeError, e:
>>        newstring = ''
>>        for char in string:
>>            if ord(char) > 127:
>>                newstring = newstring + '#'
>>            else:
>>                newstring = newstring + char
>>        newstring = newstring + 'NOTE: Characters replaced outside of UTF8 Range'
>>        return unicode(newstring)
>>    else:
>>        return string
>>
>>    
>>
>
>another option occurred to me - instead of adding the note - in the
>string, I could bring this function inside the RpmMetaData class and if
>something triggers the above, the note is appended to that entry as an
>xml comment, but still replaces the character with a '#'.
>  
>
Above is fine for the very few Latin1 characters that are in rpm 
metadata strings.

You might just try doing the conversion from Latin1 for the following:
     1) trademark
     2) copyright
     3) Bero and Trond's umlauts in changelog
That's the majority of the non-utf8 in RHL is my guess.

The general solution needs to use iconv to convert, knowing the encoding.
Creating a table with an assumed encoding for the locale is probably 
sufficient.

Run the script against MDK or PLD packages to test.

73 de Jeff





More information about the Rpm-metadata mailing list