[Yum-devel] Some perf. notes on loading prco data, and rpmdb caching

James Antill james at fedoraproject.org
Thu Oct 29 22:31:43 UTC 2009

 So first off, after me looking at this for way too long Seth just
showed me some data (which he's going to post soon) which "at best"
affects these numbers.

 So I was taking a quick look at if there is anything that we can
optimize in the "depsolve" path. And, as always the big thing that jumps
out is checkInstall and checkRemove ... not because they are slow but
because they are called 666 times, and the little bit adds up.
 The main "problems" in both are the tsInfo.getProvides()/getRequires(),
esp. because they search the entire pkgSack and then look for things
"which are about to be installed" ... which doesn't sound that great,
when you have only a couple of things in the transaction.
 But I couldn't see anything that made a worthwhile difference. It's
possible some kind of "tsInfo local" PackageSack() would help, but maybe
something for another day...

 Just as I was about to give up I noticed that all output was hanging
just after my debugging (but before the transaction list). More
debugging revealed this to be that checkFileRequires() takes ~1.5
seconds ... regardless of transaction size.
 So I did what any insane person would do, I created a giant cache of
the FileRequires data:


...however this produced the "interesting" result that you get these
timings (roughly):

  no patch
checkFileRequires : ~1.7 seconds
checkConflicts    : ~0.1 seconds

  with patch
checkFileRequires : ~0.2 seconds
checkConflicts    : ~1.6 seconds

...this eventually lead to the realization most of the problem is from
loading the prco data from rpmdb (aka. _populatePrco()).
 This problem is if we need to load .provides, .requires, .conflicts
or .obsoletes for an rpmdb package we load _all_ of them. Undoing this


...and we now get results more like:

  with both patches
checkFileRequires : ~0.2 seconds
checkConflicts    : ~0.7 seconds

...which is to say, with a giant ass amount of caching², we can drop
about 1.5 seconds off any install/update/remove ... which is a lot
consider a single/small pkg transactions is like 4.5 seconds (upto the
user interaction, that is).

 However a quick script shows that we didn't do this merging just for
the fun of it:


  Loading all prco data at once

% sudo ../rpm-load-perf-test.py
Setup: 0.000489950180054
Loaded plugins: presto
Load packages: 0.566312074661
Load Provides: 1.02764296532
Load Requires: 0.00843715667725
Load Conflicts: 0.00521993637085
Load Obsoletes: 0.00612092018127

~total = 1.0 second

  Loading each prco data separately

% sudo ../rpm-load-perf-test.py
Setup: 0.000480175018311
Loaded plugins: presto
Load packages: 0.540218830109
Load Provides: 0.496702194214
Load Requires: 0.742015838623
Load Conflicts: 0.303863048553
Load Obsoletes: 0.3119161129

~total = 1.85 seconds

...which I really don't understand.

 So I'm kind of worried I've gone a bit crazy at this point, so want
some input...

1. I can see how loading all the prco data should be IO bound, so I
might expect a second to do that ... except the different numbers imply
something else. Anyone else want to investigate the librpm code to load
the prco data?:) Anyone think it's realistic to assume we could get that
1 second down a lot?

2. Is it worth trying to cache FileRequires and Conflicts data for the
rpmdb? I mean the patch is pretty close to working, for FileRequires,
but I'm not sure how clean I feel.

3. Does Seth's data change everything ?:)

 ¹ Using rpmdb version as the cache breaker, and then caching rpmdb
version ... although that could stand on it's own (how does anyone feel
about that code :).

² I'm including caching rpmdb conflicts here too, which is the extra 0.3
or so. Although I haven't done that yet.

James Antill <james at fedoraproject.org>

More information about the Yum-devel mailing list