[Yum-devel] Yum startup speed

Sun Jan 9 06:06:20 UTC 2005

> There's no extra time hit because there'll be nothing extra happening 
> then what is already currently happening now. My proposal doesn't add 
> any extra computation. It only involves pickling of the YumPackageSack 
> instance which is currently created every time yum is invoked anyway.

The pickling would be happening and that's not a zero-time process.

> The config file(s) is unlikely to change between successive commands. 
> This is what most people are complaining about. You run yum several 
> times to say search for a package, waiting a significant time each run. 
> Then you run yum to install the package and wait again. My proposal 
> would mean that only perhaps the first run of the yum command would 
> impose a longer delay (if the pickle needs updating). The following runs 
> all use the pickle and therefore are quicker.

> > What's the benefit from having a single big pickle of all the package
> > metadata for all repos instead of having individual pickles for each?
> > B/c you're going to have read in the metadata if anything has changed
> > and, as is the case for fedora core 3,  the updates-released and 3rd
> > party repos change quite a bit. 
> 
> The current pickles are Python representations of the XML metadata files.

No they aren't - they're a dict structure of the data that used to be in
the xml. But it's not just a dump in of the xml.

> The pickle I'm proposing is of the data structure that results once the 
> XML data are combined (creating the YumPackageSack instance). This 
> combination of package data takes a significant amount of time and it 
> seems to be what a lot of people have a problem with.

> As far as I can see, this object only needs to change if there's 
> metadata updates or a config file changes. These things are unlikely to 
> change between successive commands within the one session.

unless the mirror they're using changes (which happens often with
mirrorlists) and the mirrors are out of sync (which happens even more
often, sadly)

> > Remember the metadata is more than just primary.xml - and reading in ALL
> > the metadata is a memory hit you may not want to deal with.
> 
> Doens't yum load pretty much all the metadata anyway? I'm not proposing 
> that any extra data is loaded.

no.

Try this out - if you want to see ALL the metadata loaded run this
command:

yum makecache

then compare that to:
yum list updates foo\*

in most cases filelists.xml and other.xml never get parsed. That's
intentional.

> > Gijs has suggested using something other than a python pickle to speed
> > up access of the data.  That might make things simpler in some ways.
> 
> Yep, I saw that post. I think that's also a good idea worth pursuing.

Indeed. I suggested maybe looking at sqlite b/c metakit, while cool
looking, hasn't been updated in a year and i'm afraid of maintaining
complex, potentially abandoned code. :)

> Like you mention, I think that being selective about which data yum 
> loads depending on the context is the better long term solution because 
> that's what the real problem is here. What I'm proposing is quite 
> effective and simple to implement now with minimal impact on the rest of 
> yum.

I'd love to see how simple it is to implement - I must be visualizing
something wrong but it seems like it wouldn't be that simple to
implement - but go for it - let's see a patch for it.

But watch memory, see how much it eats up on the pickle import/export.

> I've taken the time to address your feedback in detail because I still 
> think the idea has merit. If you really think it just won't work or 
> isn't worth the effort, then I'll drop it.

To be clear, i'm not against the idea  - but I'm wondering if we're
optimizing earlier than we should be. That's really my only concern, I
just don't want to make the code uglier with early optimization.

Thanks!
-sv