[Yum-devel] [PATCH] Fix main speed issue in to_xml(), slows down new createrepo a lot. BZ 716235.
Zdenek Pavlas
zpavlas at redhat.com
Fri Nov 16 10:51:12 UTC 2012
Hi,
Thanks for the patch, I like it!
(Just added \x08 to CODES, too)
I've collected arguments to all 74650 calls of to_xml(),
when creating a repo with 364 packages. It was never called
with an unicode arg. Total time spent in to_xml():
4.281 original
0.461 james, no .translate()
1.428 with .translate()
0.872 toshio .isdisjoint()
0.693 no temp frozenset
0.610 inlined to_unicode
0.585 cache two getattrs
diff --git a/yum/misc.py b/yum/misc.py
index 183f296..9c6b2c0 100644
--- a/yum/misc.py
+++ b/yum/misc.py
@@ -901,8 +901,8 @@ def seq_max_split(seq, max_entries):
# ASCII control codes that are illegal in xml 1.0
-_CONTROL_CODES = frozenset(range(0, 8) + [11, 12] + range(14, 32))
-_CONTROL_CHARS = frozenset(itertools.imap(unichr, _CONTROL_CODES))
+_CONTROL_CODES = frozenset(range(0, 9) + [11, 12] + range(14, 32))
+_CONTROL_CHARS = frozenset(itertools.imap(unichr, _CONTROL_CODES)).isdisjoint
_CONTROL_REPLACE_TABLE = dict(zip(_CONTROL_CODES, [u'?'] * len(_CONTROL_CODES)))
__cached_saxutils = None
@@ -910,20 +910,20 @@ def to_xml(item, attrib=False):
global __cached_saxutils
if __cached_saxutils is None:
import xml.sax.saxutils
- __cached_saxutils = xml.sax.saxutils
+ __cached_saxutils = xml.sax.saxutils.escape
- item = to_unicode(item, encoding='utf-8', errors='replace')
- data = frozenset(item)
+ if type(item) != unicode:
+ item = unicode(item, 'utf-8', 'replace')
# Most strings do not have control codes so test before modifying
# is a performance win
- if not _CONTROL_CHARS.isdisjoint(data):
+ if not _CONTROL_CHARS(item):
item = item.translate(_CONTROL_REPLACE_TABLE)
# Escape characters that have special meaning in xml
if attrib:
- item = __cached_saxutils.escape(item, entities={'"':"""})
+ item = __cached_saxutils(item, entities={'"':"""})
else:
- item = __cached_saxutils.escape(item)
+ item = __cached_saxutils(item)
# We shouldn't need xmlcharrefreplace when encoding to utf-8 (as utf-8 can
# represent all unicode codepoints) but use it in case we ever change the
More information about the Yum-devel
mailing list