[Yum-devel] Fastest mirror selection plugin

Michael Stenner mstenner at linux.duke.edu
Sun Aug 14 01:36:54 UTC 2005

On Sat, Aug 13, 2005 at 08:40:10PM -0400, Luke Macken wrote:
> | Using select also has one other advantage: you currently wait the
> | timeout for all threads to return and then pick the fastest (well,
> | sort, really).  You could also simply stop waiting after you get the
> | first N responses.  So, for example, if your first mirror responds
> | after 0.2 seconds, you can just stop waiting for the rest.  Sure, you
> | can do that in threads, too, but it's harder.
> The current implementaiton waits for all of the threads to
> finish/timeout as well; but picking the single fastest mirror is
> probably not a very good idea.

Well, that's configurable in the system I describe.  What I describe
could exactly replicate your result when stop_after == infinity
(probably implemented as None or something).

> Correct me if I'm wrong, but assuming the worst case scenerio where every
> mirror is expected to timeout, my algorithm takes a maximum of the
> socket timeout (2-3 seconds usually) where as a select() implementation
> would take (socket_timeout * num_mirrors).  This a pretty big
> performance hit to take just for a 'simpler' implementation, if my
> assumptions are correct.

Your assumptions are incorrect.  The way select works is that you give
it sevreal file objects (or sockets) and it waits until ANY of them is
ready, or until some timeout passes.  In our case, if we were waiting
for more than one (which I agree is a good idea) then we would note
the elapsed time when one finished and then hit select again on the
remaining sockets.  Thus, it's still parallelized and would take the
same amount of time.

> I totally agree, I think we should hash through as many ideas as
> possible before we put all of our efforts towards one.  I based my
> original implementation on speed over accuracy, since it only judges the
> mirrors 'speed' by the time it takes to connect, and not by it's
> actual throughput.

I think that's a nice first step and it's quicker.  Guessing at the
throughput will

  a) take longer
  b) require downloading more
  c) require that we know something about the server so that we can
     download something of reasonable size.

However, if we write it well, subclassing is easy :)

  Michael D. Stenner                            mstenner at ece.arizona.edu
  ECE Department, the University of Arizona                 520-626-1619
  1230 E. Speedway Blvd., Tucson, AZ 85721-0104                 ECE 524G

More information about the Yum-devel mailing list