[Yum-devel] Reducing yum startup times and memory usage

Gijs Hollestelle g.hollestelle at gmail.com
Sun Jan 9 20:57:01 UTC 2005


Hi all,

I've been looking into how yum's startup time and memory usage can be
reduced. Basicly what I have been looking at is to make mdcache and
packageSack behave a bit more like databases instead of loading all
metadata information into memory at once.

Current situation:
  - Read all the metadata information into memory (either from a
pickle or from XML)
  - Fill up packagesack object by converting all the in memory
metadata entries into YumAvailablePackage objects (this is the Reading
metadata from local files ### bar you see at startup)
  - Create memory caches for the packageSack (this allows quick
searching for requires etc)
  - When the main component needs any information about a package look
it up in this memory resident data

Problems with the current situation:
  - All prco (provides,  requires, conflicts, obsoletes) information
is loaded even when a  user does a simple yum list updates
  - All changelogs, buildtimes, summaries etc are loaded into memory
and converted between different formats, even when these are hardly
ever required (are they required for anything besides yum info?)
  - Creating the memory cache takes a non-trivial amount of time

My proposed solution:
  - Don't store all the metadata in the pickle cache, instead create
multiple pickle caches per repository, that are loaded on demand
(ideally these won't be pickle caches but embedded databases, like
metakit or sqlite)
  - Don't create full YumAvailablePackage objects for all items, but
let packageSack.returnPackages return objects that only have the most
frequently needed information in them, add a new function to the
packageSack (i.e. getFullPackage) that can give the full object if it
is really required (this won't be the case very often)
  - Store something similar to what buildIndexes does in a pickle file

What I have implemented now (fastcache.diff):

* Split the pickle cache up into two different files:
One file (same filename as original pickle with .fast.short appended)
only contains a dictionary of all packages in a repository with their
name,version,header info and location (and for now also full provides
information (this can be omitted with a little more work)
The second file contains caches for provides,requires, etc
* Construct package objects when they are needed not a load time

Have a look at the attached patch, which should be considered a proof
of concept.

On my machine, Athlon 2600+ 512 MB, Fedora Core 3 with
base,updates,freshrpms and dag repos enabled (5159 packages total)
these are the results:

- yum -C list updates: Original 8.9 seconds, 63MB memory usage. New
2.6 seconds, 24MB
The new setup only reads a small part of the metadata to be able to list updates

- yum -C install mono-complete: Original 10.4 seconds 68M. New 7.2 seconds 58M

This install was run up to the point where it asks okay to install
there no is answered.

The experiments with metakit (which are currently very ugly) are even
more promissing, they really help reducing memory usage (by about a
factor 2 or 3) and speed things up a bit more.

If you want you can have a look at the attached patch (which should
apply to the current cvs version of yum) when you apply it the first
time you run yum the fast cashes (as I have called them for now) are
created (just like pickle caches are now). When you run it next time
it will use these cashes to speed up things a bit. Currently only yum
install and list updates should work. or somewhat work as I know that
this is far from complete.

Regards,
  Gijs Hollestelle
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fastcache.diff
Type: text/x-patch
Size: 9896 bytes
Desc: not available
Url : http://lists.baseurl.org/pipermail/yum-devel/attachments/20050109/e9987694/attachment.bin 


More information about the Yum-devel mailing list