[Yum-devel] [PATCH] Stop the python encoding madness, with a big hammer of fire.

Toshio Kuratomi a.badger at gmail.com
Mon Mar 7 22:21:33 UTC 2011


On Mon, Mar 7, 2011 at 10:03 AM, Toshio Kuratomi <a.badger at gmail.com> wrote:
> On Mon, Mar 07, 2011 at 10:56:07AM -0500, seth vidal wrote:
>> On Mon, 2011-03-07 at 10:52 -0500, James Antill wrote:
>> > ---
>> >  yum/misc.py |    7 ++++++-
>> >  1 files changed, 6 insertions(+), 1 deletions(-)
>> >
>> > diff --git a/yum/misc.py b/yum/misc.py
>> > index 8e81c34..305d4aa 100644
>> > --- a/yum/misc.py
>> > +++ b/yum/misc.py
>> > @@ -977,7 +977,8 @@ def getloginuid():
>> >  # ---------- i18n ----------
>> >  import locale
>> >  import sys
>> > -def setup_locale(override_codecs=True, override_time=False):
>> > +def setup_locale(override_codecs=True, override_time=False,
>> > +                 override_encoding=True):
>> >      # This test needs to be before locale.getpreferredencoding() as that
>> >      # does setlocale(LC_CTYPE, "")
>> >      try:
>> > @@ -995,6 +996,10 @@ def setup_locale(override_codecs=True, override_time=False):
>> >          import codecs
>> >          sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout)
>> >          sys.stdout.errors = 'replace'
>> > +    if override_encoding:
>> > +        # Dear python, please let your 'ascii' default die in a fire. kthxbye
>> > +        reload(sys)
>> > +        sys.setdefaultencoding('utf-8')
>> >
>> >
>> >  def get_my_lang_code():
>>
>>
>> So, you're just interested in seeing what ways this breaks things?
>>
>> How about we apply this to rawhide yum first, just for s&g and see what
>> goes KABOOM before applying upstream?
>>
> Although getting rid of sys.setdefaultencoding() is probably a good thing
> (upstream python claims that using it will break certain aspects of text
> handling in python's internals), I agree that there's a lot of potential to
> break stuff by making this change.  Test and fix will be in order when this
> is applied.
>

Actually -- I read that wrong.  You're adding sys.setdefaultenconding() into
the mix... I thought it was subtracting it.

Adding sys.setdefaultencoding() at this stage in yum's development is not
a safe change.  Martin v Lŏwis writes that using sys.setdefaultencoding will
change the behaviour of hash() and therefore the behaviour of comparisons::

  http://article.gmane.org/gmane.comp.python.devel/109917

It took me some experimenting to figure out a test case that shows this so
I'll list it here::

$ python
(14:08:48):1191
Python 2.7 (r27:82500, Sep 16 2010, 18:02:00).
[GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> a = {u'á': 1}
>>> u'á'.encode('latin-1') in a
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert
both arguments to Unicode - interpreting them as being unequal
False
>>> u'á'.encode('utf-8') in a
False
>>> u'á'.encode('latin-1') == a.keys()[0]
False
>>> u'á'.encode('utf-8') == a.keys()[0]
False
>>> import sys
>>> reload(sys)
<module 'sys' (built-in)>
>>> sys.setdefaultencoding('utf-8')
>>> a = {u'á': 1}
>>> u'á'.encode('latin-1') in a
False
>>> u'á'.encode('utf-8') in a
False
>>> u'á'.encode('latin-1') == a.keys()[0]
False
>>> u'á'.encode('utf-8') == a.keys()[0]
True
>>>

Since sys.setdefaultencoding() is a global change, it affects all code
that is run as part of yum; not just the parts that are inside of yum
itself so this seems like a change that's going to lead to other
breakage which might be in libraries that yum uses and there won't be
a possibility of getting those libraries changed because the reason
the problems are occurring is that you're doing something wrong by
using sys.setdefaultencoding().

-Toshio


More information about the Yum-devel mailing list