[Yum-devel] [Patch] Resolver Performance and Correctness

Florian Festi ffesti at redhat.com
Thu Jun 14 15:08:33 UTC 2007

Jeremy Katz wrote:
> On Mon, 2007-06-11 at 17:49 +0200, Florian Festi wrote:
>> _checkFileRequires:
>> Ok, that's the difficult one...
>> Rational between that method is that there are much more files that there 
>> are file requires. So it trades the number of files within the removed pkgs 
>> for the number of overall file requirements. This is a bad deal for very 
>> small transactions but pays of comparably fast.
> *nod*  The question is where is the tipping point.  And is the cost for
> the smaller ones worth it (probably).  Also, the additional sets you
> allude to are going to have a cost as far as memory usage that has to be
> weighed as well.  But making the move to having this split out and on
> its own as a change on its own is going to be a lot easier to go with
> than just doing it as part of the big thing.
> Jeremy

Ok, lets have a look at F7 Everything/FC 6 Core:

 > sqlite3 primary.sqlite
sqlite> select count(*) from packages;
sqlite> select count(*) from requires;
sqlite> select count(*) from requires where name like "/%";
sqlite> select count(DISTINCT name) from requires where name like "/%";

Average of 13.5 requires/package
Average of about one file requires per package
321/210 distinct file requires

 > sqlite3 filelists.sqlite
sqlite> select sum(length(filetypes)) from filelist;

Average of ~140/190 files per package

Assuming a 1000 pkg install there may be 14000 requires and 1000 file 
requires and may be 150 distinct file requires. Reducing from requires to 
distinct file requires is done in memory and really simple/fast. The 150 
queries to the db should not hurt more that the files of an average package. 
The question is how many files can we drop checking because they are in the 
update. Current code uses only the files that have been loaded anyway. And I 
don't see a reason why a sqlite package should load its filelist. (We check 
only REMOVE packages which all come from the rpmdb)

Next interesting question if what is more expensive loading the requires 
into the ram or the files. It turns out that the rpmdb is quite fast. 
Preparing all that data just needs less that a second on my computer. I 
still see that the searches need 4.3 seconds once in most scenarios in 
opposite to 1.2 seconds in some other cases. Guess there is still some time 
in the sqliteSack...

Florian Festi

More information about the Yum-devel mailing list