[Yum-devel] Still impossible to exit yum

Wed Jul 4 13:20:44 UTC 2007

On Wed, 2007-07-04 at 09:31 +0300, Panu Matilainen wrote:
> On Tue, 3 Jul 2007, seth vidal wrote:

> Whether it caused rpmdb corruption or not I dunno, it's entirely possible 
> it triggered races in the locking. I don't trust the rpm locking that 
> much. But more to the point, re-opening the db for each and every rpmdb 
> access to avoid holding the db open isn't what Jeff means about "being 
> careful" in 1) :) More like: open it when you have to, do your business 
> and close it. The less you do that, the better.
> 
> Since the new depsolver, the situation would look roughly like this:
> 1) open+close db for checking distroverpkg
> 2) download metadata if necessary
> 3) open db, depsolve
> 4) if filelists needed in 3), close db, download and reopen, continue 3)
> 5) close the db
> 5) download packages
> 6) do the final transaction

We need to have it open during the second 5, too - for sigchecking pkgs.
This is the section where we often get complaints b/c it is difficult to
abort the process b/c of the ctrl-c being grabbed and b/c of all the
mirrors it skips to.
So either we open,check, close, for every package or we open and leave
it open for the entire downloadPkgs process. I'd worry that doing it for
each and every package would be too much for rpm's locking and will get
us back to where we were a few months ago.

Another option would be to only do sha1 integrity checking on download,
wait until everything is downloaded and THEN do gpg checking all in one
shot It would make the interface a little less attractive but not
devastatingly so.

> The reopens in 4) are at max the number of enabled repositories, whereas 
> earlier the similar situation if was the number of packages in the 
> transaction and then some. A *big* difference there. I think the previous 
> time yum cached rpmdb header id's over those reopens which is not really 
> safe, if such tricks aren't done now it should be just fine to do the 
> above.

If we don't keep track via header ids then all the lookups take forever,
unfortunately. What I was thinking is could we take a timestamp or
checksum of the current 'version' of the rpmdb. If that's not changed
then we can use our header ids we have cached. If it has changed then we
invalidate the header ids and get them again. My two questions are:

1. does that seem safe?
2. is there a db-version or journal or some other information in the
rpmdb we can use to know if it has been changed?

> Then there's the extreme approach: open the rpmdb just once initially and 
> import the data you need into a sqlite db just like any other repodata and 
> then close it. With the new depsolver, you only need to open it again for 
> the actual transaction.

That seems like an extremely expensive option, doesn't it? The import
process will take a while, not to mention the file-lookup cost.

> If it can be done in a sane way, yes. I'm not that familiar with the 
> Python C API (yet :) but I would assume it's possible to plant a 
> sys.excepthook from C when needed (rpmdb iterators open, basically) and 
> clean up things from there and then chain back to original excepthook.
> We'll see...

thanks for looking at this.

-sv