[Yum] Performance enhancement: [was yum performance]
Joseph Tate
jtate at dragonstrider.com
Sat May 1 04:30:10 UTC 2004
Here's my idea. If you've already thought of this and rejected it, I'd
love to know why.
1. Switch yum to use -C by default, introduce a different flag to
signal an update to the cache. Thus day to day operations can be done
with a pregenerated hash and a prebuilt cache.
2. Build a serializeable hash structure, or utilize an existing
mechanism. I don't care if it's stored in XML or db4 or just a plain
text file, but make it fast to read in and write out. Just this will
make N fopen, fread, fclose calls to 1 set. Won't do much for memory
usage though.
3. Change the yum cron job to use the new -!C flag so that that hash
index gets generated nightly.
As I don't think this would be too difficult, I'll try generating some
patches against HEAD, though as a non-Python programmer it may look a
little Perl-ish.
Joseph
seth vidal wrote:
>>First idea: I remember that hashes are fast to search, but,
>>comparatively, very slow to grow. To overcome this, most
>>hash libraries allow to define an initial size, which should
>>be best guessed large enough to accomodate all the entries,
>>to avoid frequent time consuming resizes while filling.
>>Does your package offer this feature?
>
>
> Have you programmed in python before? A simple python dict is what I'm
> talking about. You can build up that sort of dict and traverse it but
> you still have to:
>
> open the package
> get the data you want
> put the data in the dict
> close the package
>
> Doesn't sound too bad - but the process for opening and looking through
> a package does take some time.
>
>
>
>>Second idea: you mentioned package traversal as time consuming.
>>Is this time spent to open each package as a DB, grab the
>>info, close it? If this is the case, have you then considered
>>building a cache of package contents, which can be updated
>>and used in subsequent runs, to take advantage that most
>>(if not all) the packages do not change between yum runs?
>
>
> In this case it's opening up each header, getting the data and moving
> along, but yes, it can take some time to search each one.
>
> Where do you store that cache? How do you store it? How do you update it
> to make sure it's not out of sync with the repository w/o reindexing all
> the headers/packages? Feel free to answer any/all of those questions.
>
> Some of these have already been addressed - many of them is why I spent
> so much time working on the xml-metadata to sort out
> easier/faster/better ways of indexing the packages so yum can:
>
> 1. know if there are changes
> 2. more easily traverse the packages and the metadata
> 3. have smaller amounts of data to download and sort through on any run.
>
> Right now I'm making those changes work then I'm going to focus on
> trimming time out of each session. It will still be some time b/c I'm
> working on this as I can.
>
> If you want to be a big help, don't look at improvements for speedups to
> the 2.0.X branch. I don't want to spend more time on 2.0.X if it is at
> all possible. A lot of things in the structure has changed and cvs-HEAD
> is where I'm trying to work the most. When I have a snapshot that does
> some useful things I'll be sure to announce it here and yum-devel.
>
> If you're a python programmer and you're familiar with libxml2 - then
> take a look at http://linux.duke.edu/metadata/generate/ - feel free to
> make that code:
>
> 1. look for an existent repodata dir
> 2. if it finds one - use the xml files there to speed up the update
> creation of the new metadata for that repository.
>
> -sv
>
>
> _______________________________________________
> Yum mailing list
> Yum at lists.dulug.duke.edu
> https://lists.dulug.duke.edu/mailman/listinfo/yum
More information about the Yum
mailing list