[Yum-devel] Parallel downloader: wrap-up, benchmarks

Zdenek Pavlas zpavlas at redhat.com
Thu Dec 8 15:27:19 UTC 2011


> > Test setup 3
> > ============
> > 
> > Added a simple 'mirror sweep' to urlgrabber/mirror.py, that
> 
>  Does this take into account max-connections?

Yes, it does.  It's a hack in the code path where the (initial)
mirror is chosen and the request goes down from MG to UG.

--- a/urlgrabber/mirror.py
+++ b/urlgrabber/mirror.py
@@ -403,6 +403,8 @@ class MirrorGroup:
                 # async code iterates mirrors and calls failfunc
                 kwargs['mirror_group'] = self, gr, mirrorchoice
                 kwargs['failfunc'] = gr.kw.get('failfunc', _do_raise)
+                # increment master
+                self._next = (self._next + 1) % min(len(self.mirrors), 5)
             try:
                 return func_ref( *(fullurl,), **kwargs )
             except URLGrabError, e:

The URLGrabber module (and the parallel downloader as well) does not
'understand' MGs, it's only a failover method.  We now have to assign
the initial mirror to each request BEFORE downloading starts.

In most cases this is OK, but sometimes (few files of very different sizes)
we may have a MG = [A, B], where A is idle and B has requests queued.

This could be addressed at runtime (detect when a host is under-utilized
and a request to another host in the same MG is queued), but I'm not sure
it's worth the effort.  Distributing requests 'statically' to sufficient 
number of mirrors and relying on statistics seems easier :)

> > DEFAULT_MAX_CONNECTIONS = 3
> 
>  So mirrorlist isn't the problem, really. The big thing is what
>  happens with baseurl ... I'd be tempted to make this 1 for ftp and
>  4 for http/https. We probably need a repo config. option for this though.

max_conn=1 for each mirrorlist entry, and opts.max_connections for
the baseurl host.

> what you want is "how many connections should I be doing at once" ... 
> this way you can say "do 4 connections at once", and by default it'll do 
> 4 to your local server ...
> but if that is down, it'll do 1 each to the next 4 or whatever
> (depending on max-connections for each host).

When there's a baseurl, we can use the config option.  When using
mirrorlist/metalink, we COULD use the same value, but now it's
(assuming max_connections=1 is a norm and unlikely to change) 
'number of mirrors', something quite different..

>  Yeh, rm curlMulti ... Fixed! :)

Not a bug really, just incomplete documentation.  doing select() before 
perform() fixes the issue nicely.  They don't do that in the sample 
downloader, though.


More information about the Yum-devel mailing list