[Rpm-metadata] Re: unicode conversion
Jeff Johnson
n3npq at nc.rr.com
Sat Oct 25 13:24:45 UTC 2003
seth vidal wrote:
>On Sat, 2003-10-25 at 02:04, seth vidal wrote:
>
>
>>hi all,
>> this is what I did, comments?
>>
>>def utf8String(string):
>> """hands back a unicoded string"""
>> try:
>> string = unicode(string)
>> except UnicodeError, e:
>> newstring = ''
>> for char in string:
>> if ord(char) > 127:
>> newstring = newstring + '#'
>> else:
>> newstring = newstring + char
>> newstring = newstring + 'NOTE: Characters replaced outside of UTF8 Range'
>> return unicode(newstring)
>> else:
>> return string
>>
>>
>>
>
>another option occurred to me - instead of adding the note - in the
>string, I could bring this function inside the RpmMetaData class and if
>something triggers the above, the note is appended to that entry as an
>xml comment, but still replaces the character with a '#'.
>
>
Above is fine for the very few Latin1 characters that are in rpm
metadata strings.
You might just try doing the conversion from Latin1 for the following:
1) trademark
2) copyright
3) Bero and Trond's umlauts in changelog
That's the majority of the non-utf8 in RHL is my guess.
The general solution needs to use iconv to convert, knowing the encoding.
Creating a table with an assumed encoding for the locale is probably
sufficient.
Run the script against MDK or PLD packages to test.
73 de Jeff
More information about the Rpm-metadata
mailing list