[Yum-devel] [PATCH 3/3] Implement getPackageAsync() and getPackageDone()

Fri Jul 29 12:36:45 UTC 2011

> There is no way to get progress data out, AFAICS.

I can feed the progress through pipe to yum, and handle it 
in getIdleProcess().  (multiplexing progress with errorlevels).

> The API both returns before what you've requested is downloaded _and_
can block for an indeterminate amount of time.

Never thought of this as a problem.  The blocking is necessary to have
a simple 1-pass API + to limit the number of spawned downloaders.
The progress display and proper timeout handling in the download
helper should prevent being blocked for too long.

> The downloaders are global, so one downloader might talk to
rpmforge.org and redhat.com ...

They're not global, each repo has a separate pool of downloaders.
I think it makes sense to treat repositories independently, e.g.
repo A should not care how many processes download from repo B.
Since every downloader is started in particular repo's package
directory and never chdirs, It could chroot later.

> also means keepalive is going to be interesting.

Keepalives and chroot should work fine with a proper download helper
that doesn't spawn new process for each URL.

> This uses select directly instead of poll, is there some reasoning?
asyncore next (might be the best yet ?:).

The select() loop only considers active downloaders for a particular
repo.  I guess the usual number of downloaders is going to be very small,
(usually < 5) as large number of processes ruins connection: keep-alives.
So using more efficient (but more complex to set up) interfaces as poll()
or epoll() is (imho) not necessary.

> If you are stuck trying to solve "the big problem" all at once, and are
sending out this is an update of where you are atm. ... I can
understand, but you'll probably go crazy trying to do it that way (and
maybe take me with you ;).

Hope not ;)

> My suggestion would be to solve a small part of the problem fully. Eg.
get an "almost 100%" patch for urlgrabber.grab() that spawns a single
process, does the download and returns progress info. Then when we've

That's doable, sure. We'd have downloads in a separate process, with 99%
compatible semantics.  But because of the blocking API, this can't be
easily parallelized later.

> got that, we can start from that base so it can run 2 procs. at once ...
then ... eventually the pain needed to get it integrated into yum.

You mean, define a new API (on top of the old one), that allows parallel
downloading, and wrap it further up to the point when it's usable 
in rpms/drpm's download code, and in metadata download code?

I considered taking the route the other way round.  Anyway, all the stuff
in-between must be reimplemented, and that's what bothers me.  I can write
a new, straightfdorward code, implementing only the necessary features.
Or, I could try to reuse and patch the old code, keeping all the seemingly
unused bits, and features-to-be.  That's to be discussed I think..

--
Zdenek