[Yum-devel] [PATCH] Avoid converting to unicode and back in dump_xml_*. BZ 716235.

Zdenek Pavlas zpavlas at redhat.com
Wed Nov 28 09:03:13 UTC 2012


> What did you test this on?

Ordinary Yum usage for about a week, lots of createrepo --update runs.

>> -        val = _nf2ascii(val)
>> +        val = misc.to_unicode(val, errors='replace')
>> +        val = val.encode('ascii', 'replace')

> This is old, so maybe python got fixed since then ... but _nf2ascii was
written because we got a lot of randomly encoded stuff in changelog
files and python would just traceback when it hit it.

Thats my guess, too.  .encode('ascii', 'replace') seems to never fail:
>>> for i in xrange(0x110000): ignore=unichr(i).encode('ascii', 'replace')
... 

There was a typo, missing "errors=", so 'replace' was used as encoding
instead of 'utf8'.. (fixed).

> This scares me as python has traditionally been _very_ picky about how
+= behaves ... Eg.

Yes, unicode is "viral".. When joining strings, we have to make sure
that none or all of them are unicode (ascii converts to unicode fine).

Since this patch got rid of all to_unicode() calls, we don't call "foreign"
code, and the asserts have never fired, I guess these should be fine.

Of course, if someone overrides _dump_base_items() to return unicode,
these += might break.  But that would be detected immediately.

>> -            return zip(misc.to_unicode(self.hdr['changelogtime'], errors='replace'),
>> -                       misc.to_unicode(self.hdr['changelogname'], errors='replace'),
>> -                       misc.to_unicode(self.hdr['changelogtext'], errors='replace'))
>> +            return zip(self.hdr['changelogtime'],
>> +                       self.hdr['changelogname'],
>> +                       self.hdr['changelogtext'])

> This is changing the types we are returning too.

Yes, that's user-visible.  But the only users I've found are _committer and _committime.
_committer is fine, _committime is not (it was returing unicode, now it returns unchecked utf8).
I should probably make sure it returns ascii-only.


More information about the Yum-devel mailing list