[Rpm-metadata] createrepo/utils.py
Toshio Kuratomi
a.badger at gmail.com
Thu Apr 17 02:21:17 UTC 2008
seth vidal wrote:
> On Wed, 2008-04-16 at 16:45 -0400, seth vidal wrote:
>> On Wed, 2008-04-16 at 13:23 -0700, Toshio Kuratomi wrote:
>>> Yeah. Forgive me for saying this, but utf8String() is a bit crazy :-)
>> I wrote/copied some of it about 4 years ago, but I can't remember where
>> it came from conceptually. It doesn't seem like something I'd have
>> thought of but I can't remember where it is from.
>>
>> It was mostly to deal with crazy random encodings coming out of various
>> packages at the time, istr there was a hangup with random versions with
>> garbage in changelog and other fields and this dealt with that.
>>
>> lemme see if the mailing list archives show any of this off.
>
> I figured out where this sort of came from:
> Java's unicode encode/decode attempts to convert a string and when it
> cannot it replaces the bad characters with question marks and returns
> whatever it gets to the caller. It does this b/c it is a string and it
> is better to get some of it than none of it.
>
Excellent. So after some discussion on IRC we think that this is mostly
the same as unicode('byte string', encoding, 'replace') to go from a
byte string to a unicode object and u'unicode string'.encode(encoding,
'replace') to go from unicode to a byte string. This seems to work fine
under python-2.3+. kmeyer tested on python2.2.3 and it seemed to do the
right thing there as well.
Of course, at this point utf8String does a few things more than this:
1) It iterates through multiple encodings... something which may be good
if we know that the majority of foreign characters are a certain
character set. If they aren't it's bad as we'll get a slew of garbage
results.
2) It removes characters illegal in xml documents.
-Toshio
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: OpenPGP digital signature
Url : http://lists.baseurl.org/pipermail/rpm-metadata/attachments/20080416/8597c17e/attachment.pgp
More information about the Rpm-metadata
mailing list