[Yum-devel] Yum startup speed

seth vidal skvidal at phy.duke.edu
Sun Jan 9 02:13:56 UTC 2005


On Sun, 2005-01-09 at 11:08 +1000, Menno Smits wrote:
> Hi all,
> 
> After following the long thread on fedora-devel, "why doesn't yum cache 
> anything?", I started prodding around as this was something that's been 
> bugging me too. (Thread is at: 
> http://www.redhat.com/archives/fedora-devel-list/2004-December/msg01057.html)
> 
> Based on some profiling I found that much of yum's startup time is in 
> reconstruction of the big YumPackageSack object from the XML metadata. 
> This is probably no suprise to most people on this list.
> 
> So I thought, "why doesn't yum just cache the whole thing as a pickle 
> rather recreating it every time"? I roughly hacked up 
> YumBase.doSackSetup() so that after building the YumPackageSack object 
> it writes it to a pickle and if this pickle exists in future calls it 
> blindly uses it.
> 
> Next I benchmarked successive yum runs both with the pickle present and 
> not present, using "times yum -C list mtr". Here's the result:
> 
> Uncached
> 19.225 31.197 17.653 17.208 (avg = 21.32s)
> 
> Cached
> 11.469 8.546 9.233 9.853 (avg = 9.775s)
> 
> The execution time is almost halved when the pickle is used, which is a 
> pretty decent improvement.
> 
> The problem with this idea is knowing when the pickle can be used and 
> when it needs to be rebuilt. The YumPackageSack object relies on the XML 
> repo metadata, the repos that are enabled and the package 
> exclusions/inclusions from the config files and command line. If any of 
> these change the pickle needs to be rebuilt or can't be used.
> 
> My proposal is:
> * keep a pickle of the YumPackageSack object
> 
> * if any repository metadata changes, rebuild YumPackageSack and 
> overwrite existing pickle

How much of a time hit is this? B/c any repo metadata is going to
change.



> * if any config file changes (check via checksums), rebuild 
> YumPackageSack and overwrite existing pickle

again - not an uncommon occurrence, even less uncommon in a graphical
interface.


> * if package exclusions are given on the command line, don't use the 
> YumPackageSack pickle and just build a correct one on the fly



> 
> I'm happy to develop this further if people here think it might be 
> worthwhile. Is this a reasonable idea or do I need to get more sleep? :)

What's the benefit from having a single big pickle of all the package
metadata for all repos instead of having individual pickles for each?
B/c you're going to have read in the metadata if anything has changed
and, as is the case for fedora core 3,  the updates-released and 3rd
party repos change quite a bit. 

In addition, you're taking a memory hit by loading up stuff you may not
want. And if you have to do another lookup you're going to need to read
in the filelists data which may not be there.

Remember the metadata is more than just primary.xml - and reading in ALL
the metadata is a memory hit you may not want to deal with.


I think you're talking about optimization for the ideal but not terribly
common case.

Gijs has suggested using something other than a python pickle to speed
up access of the data.  That might make things simpler in some ways.

Another set of options might be to generate a smaller-than-primary-but-
really-common lookup file on the repo-side to make startup faster. I'm
not really sure what the best tack is here, then again, I've done the
benchmarks and timing on this and I know how long it takes to read in
the data. Many of the folks who were complaining in that thread were
suffering b/c the mirrors were out of sync and they weren't using the
metadata pickles at all.

-sv





More information about the Yum-devel mailing list