[Yum-devel] [PATCH] Fix main speed issue in to_xml(), slows down new createrepo a lot. BZ 716235.

Zdenek Pavlas zpavlas at redhat.com
Fri Nov 16 10:51:12 UTC 2012


Hi,

Thanks for the patch, I like it!
(Just added \x08 to CODES, too)

I've collected arguments to all 74650 calls of to_xml(),
when creating a repo with 364 packages.  It was never called
with an unicode arg.  Total time spent in to_xml():

4.281 original
0.461 james, no .translate()
1.428 with .translate()

0.872 toshio .isdisjoint()
0.693 no temp frozenset
0.610 inlined to_unicode
0.585 cache two getattrs

diff --git a/yum/misc.py b/yum/misc.py
index 183f296..9c6b2c0 100644
--- a/yum/misc.py
+++ b/yum/misc.py
@@ -901,8 +901,8 @@ def seq_max_split(seq, max_entries):
 
 
 # ASCII control codes that are illegal in xml 1.0
-_CONTROL_CODES = frozenset(range(0, 8) + [11, 12] + range(14, 32))
-_CONTROL_CHARS = frozenset(itertools.imap(unichr, _CONTROL_CODES))
+_CONTROL_CODES = frozenset(range(0, 9) + [11, 12] + range(14, 32))
+_CONTROL_CHARS = frozenset(itertools.imap(unichr, _CONTROL_CODES)).isdisjoint
 _CONTROL_REPLACE_TABLE = dict(zip(_CONTROL_CODES, [u'?'] * len(_CONTROL_CODES)))
 
 __cached_saxutils = None
@@ -910,20 +910,20 @@ def to_xml(item, attrib=False):
     global __cached_saxutils
     if __cached_saxutils is None:
         import xml.sax.saxutils
-        __cached_saxutils = xml.sax.saxutils
+        __cached_saxutils = xml.sax.saxutils.escape
 
-    item = to_unicode(item, encoding='utf-8', errors='replace')
-    data = frozenset(item)
+    if type(item) != unicode:
+        item = unicode(item, 'utf-8', 'replace')
     # Most strings do not have control codes so test before modifying
     # is a performance win
-    if not _CONTROL_CHARS.isdisjoint(data):
+    if not _CONTROL_CHARS(item):
         item = item.translate(_CONTROL_REPLACE_TABLE)
 
     # Escape characters that have special meaning in xml
     if attrib:
-        item = __cached_saxutils.escape(item, entities={'"':"""})
+        item = __cached_saxutils(item, entities={'"':"""})
     else:
-        item = __cached_saxutils.escape(item)
+        item = __cached_saxutils(item)
 
     # We shouldn't need xmlcharrefreplace when encoding to utf-8 (as utf-8 can
     # represent all unicode codepoints) but use it in case we ever change the


More information about the Yum-devel mailing list