yum check takes two hours, 60% in 2 billion calls to str_eq()
Lars Ellenberg
lars.ellenberg at linbit.com
Fri May 30 14:46:06 UTC 2014
Hi there.
(Cc me, I'm not subscribed)
I've got a "yum check" taking two hours.
(really single core CPU time, actually, not much IO going on there, afaics)
I cProfiled it.
60% of the time it seems to spend in i18n.str_eq
I pastied an SVG (gprof2dot) here: http://paste.fedoraproject.org/105517/raw
It does 2001811124 calls to i18n.str_eq
Yes, that's 2 billion.
... seems excessive ...
That implies 4 billion calls to isinstance(,unicode),
which in itself accounts for 26.32% of total time.
rpm -qa | wc -l : 716
rpm -qa --provides | wc -l : 71774
yum-3.2.29-40.el6.centos.noarch
Where do all those "provides" come from?
It's a dev box, and has a number of kernel-debug versions installed.
rpm -qa kernel\* | xargs rpm -q --provides | wc -l : 64805
Seems like you have some O(2) or worse behaviour below check_provides(),
And with the usual 10000 or so provides
(single kernel package and then some),
it is "acceptable",
but somewhere higher there is a trigger point,
and things just fall appart.
Still I don't see where the 2 billion come from.
Can someone explain that to me?
Package list, rpm db, binary python profile stats,
all available on request.
I even take hints as to what could be done about it where,
and would try to tackle the python code responsible.
Unroll some loop, pre-populate some lookup table, I don't know.
I'd need some pointers: what it is supposed to do,
how it currently tries to do it, and why ;-)
Or does it really have to be this way?
Other than that: below check_provides, I think we can be certain that
all strings we are going to compare will be the same encoding?
Then why not simply do a == b?
That would at least speed up things a bit,
even though doing that 2 billion times will still take some time.
Thanks,
Lars
More information about the Yum-devel
mailing list