[Yum-devel] curlmulti based parallel downloads
zpavlas at redhat.com
Fri Sep 30 10:45:39 UTC 2011
> These are all fine, although I'm not sure we want to move to using
> MultiFileMeter instead of our current progress meter.
How could we use the current single-file progress meter?
Show just one bar for the total size? Or play with ECMA-48
CSI sequences to switch rows?
> Why do we want to use these two extra functions instead of just
> triggering off a parameter (like the errors one)?
I need some 'sync' call that blocks until all downloads are finished,
and adding the parallel_end() function is IMO better than adding a flag
to each request to signal end-of-batch. Pairing that with parallel_begin()
eliminated the need to pass a sync/async flag to each request.
I can drop parallel_begin() and the global flag, and add an option
instead. Something like 'async = (key, limit)', that reads as:
"run in parallel with other request, but keep #connections with
the same key below given limit."
> Even if we can't mix new/old downloads at "0.1" it
> seems bad to make sure we never can.
Actually, it kinda works now. parallel_end() disables parallel
downloads first, then processes the queue in parallel. So, if
there's urlgrab() in checkfunc, it's blocking. I agree that
explicit flag would make the behavior more predictable.
> objects and weird interactions with NSS etc. ... sticking it in an
> external process should get rid of all of those bugs, using CurlMulti
> is more likely to give us 666 more variants of those bugs.
I'm all for running downloads in external process, but there are
more ways how to do it:
1) fork/exec a single curlMulti process for whole batch, and mux
all requests and progress updates through a single pair of pipes.
(that's what I was thinking of implementing). Should be very
2) fork/exec for each grab. Simple and most compatible, but quite
inefficient: no keepalives, large exec overhead for each file.
3) As above, but try to reuse downloaders. That reduces the exec
overhead, but while 2) keeps # of processes low, 3) could fork-bomb
if it keeps an idle downloader process for each mirror tries.
> 1. Get all the curl/NSS/etc. API usage out of the process, this
> should close all the weird/annoying bugs we've had with curl* and make sure
> we don't get any more. It should also fix DNS hang problems. This
> probably also fixes C-c as well.
Not sure what you mean.. Using CurlMulti in external process
keeps Curl/NSS away from yum as well.. If there are known problems
with NSS & CurlMulti interaction then that's a different thing.
> Full level this is SELinux+chroot+drop privs, and we can
> be a bit lax in implementing this.
I tried that concept and chroot() seems currently to be a no-no.
I assume that it only makes sense to chroot BEFORE connect(),
since most attacks target bugs in header parsing or SSL handhakes.
But: chroot() && connect() => Host not found.
I assume it's because the resolver can't read /etc/resolv.conf
after the chroot, hmm. Doing a dummy name lookup before
chroot helps, but I don't dare to say how reliable is that.
More information about the Yum-devel