[Rpm-metadata] Re: unicode conversion
n3npq at nc.rr.com
Sat Oct 25 13:24:45 UTC 2003
seth vidal wrote:
>On Sat, 2003-10-25 at 02:04, seth vidal wrote:
>> this is what I did, comments?
>> """hands back a unicoded string"""
>> string = unicode(string)
>> except UnicodeError, e:
>> newstring = ''
>> for char in string:
>> if ord(char) > 127:
>> newstring = newstring + '#'
>> newstring = newstring + char
>> newstring = newstring + 'NOTE: Characters replaced outside of UTF8 Range'
>> return unicode(newstring)
>> return string
>another option occurred to me - instead of adding the note - in the
>string, I could bring this function inside the RpmMetaData class and if
>something triggers the above, the note is appended to that entry as an
>xml comment, but still replaces the character with a '#'.
Above is fine for the very few Latin1 characters that are in rpm
You might just try doing the conversion from Latin1 for the following:
3) Bero and Trond's umlauts in changelog
That's the majority of the non-utf8 in RHL is my guess.
The general solution needs to use iconv to convert, knowing the encoding.
Creating a table with an assumed encoding for the locale is probably
Run the script against MDK or PLD packages to test.
73 de Jeff
More information about the Rpm-metadata