[Yum-devel] curlmulti based parallel downloads

Zdenek Pavlas zpavlas at redhat.com
Fri Sep 30 10:45:39 UTC 2011


>  These are all fine, although I'm not sure we want to move to using
> MultiFileMeter instead of our current progress meter.

How could we use the current single-file progress meter?
Show just one bar for the total size?  Or play with ECMA-48
CSI sequences to switch rows?

>  Why do we want to use these two extra functions instead of just
> triggering off a parameter (like the errors one)?

I need some 'sync' call that blocks until all downloads are finished,
and adding the parallel_end() function is IMO better than adding a flag 
to each request to signal end-of-batch.  Pairing that with parallel_begin()
eliminated the need to pass a sync/async flag to each request.

I can drop parallel_begin() and the global flag, and add an option
instead.  Something like 'async = (key, limit)', that reads as:
"run in parallel with other request, but keep #connections with
the same key below given limit."

> Even if we can't mix new/old downloads at "0.1" it
> seems bad to make sure we never can.

Actually, it kinda works now.  parallel_end() disables parallel
downloads first, then processes the queue in parallel.  So, if
there's urlgrab() in checkfunc, it's blocking.  I agree that
explicit flag would make the behavior more predictable.

> objects and weird interactions with NSS etc. ... sticking it in an
> external process should get rid of all of those bugs, using CurlMulti
> is more likely to give us 666 more variants of those bugs.

I'm all for running downloads in external process, but there are
more ways how to do it:

1) fork/exec a single curlMulti process for whole batch, and mux
all requests and progress updates through a single pair of pipes.
(that's what I was thinking of implementing).  Should be very 
efficient.

2) fork/exec for each grab.  Simple and most compatible, but quite
inefficient: no keepalives, large exec overhead for each file.

3) As above, but try to reuse downloaders.  That reduces the exec
overhead, but while 2) keeps # of processes low, 3) could fork-bomb
if it keeps an idle downloader process for each mirror tries.

> 1. Get all the curl/NSS/etc. API usage out of the process, this
> should close all the weird/annoying bugs we've had with curl* and make sure
> we don't get any more. It should also fix DNS hang problems. This
> probably also fixes C-c as well.

Not sure what you mean..  Using CurlMulti in external process 
keeps Curl/NSS away from yum as well..  If there are known problems
with NSS & CurlMulti interaction then that's a different thing.

> Full level this is SELinux+chroot+drop privs, and we can
> be a bit lax in implementing this.

I tried that concept and chroot() seems currently to be a no-no.
I assume that it only makes sense to chroot BEFORE connect(),
since most attacks target bugs in header parsing or SSL handhakes.

But: chroot() && connect() => Host not found.

I assume it's because the resolver can't read /etc/resolv.conf
after the chroot, hmm.  Doing a dummy name lookup before
chroot helps, but I don't dare to say how reliable is that.

--
Zdenek


More information about the Yum-devel mailing list