[Yum-devel] URLGrabber Batch Mode

Michael Stenner mstenner at linux.duke.edu
Sun Feb 29 02:03:51 UTC 2004


On Sat, Feb 28, 2004 at 02:06:19PM -0500, Ryan Tomayko wrote:
> On Sat, 2004-02-28 at 12:30, Michael Stenner wrote:
> > That is also being implemented (hell, it might be done... Ryan?) in
> > the current urlgrabber.  
> 
> Range support is there, automatic reget is not. Today if you wanted
> reget, the calling application would have to determine the range by
> looking at the file and pass the appropriate range kwarg. For example,
> if we had 499 bytes in a file on disk, the calling application could do
> the following:
> 
> urlgrab(url, file, range=(500,)) 
> 
> I've never seen this request on the yum list myself. If this is
> something you guys want I could probably have it in CVS in 30 minutes.
> I'm thinking another kwarg possibly (reget=True or something) that would
> automatically determine the range based on the local file.

I think it should be there.  I'm not sure about the values of the
argument.  I remember that the implementation in asp-yum had a couple
of algorithms for choosing when/how to do it.  This could very simply
be handled by allowing string arguments to reget= for which type to
use.  None or False could be used for no reget, and True could mean
"default" or something.  I don't really care.  I'd just prefer we
leave that option open.

Ryan: you and I should chat on irc or in another thread about the use
of True/False - these are new things and we should decide how/when we
want to use them.

> What I don't like is that it is only possible for urlgrab where we
> have a local filename and not urlread/urlopen. So, a "reget" kwarg
> would not be generically useful like most other kwargs (exception
> being retry). Not a big deal, IMO, just moves you a little further
> from perfect.

I don't think this is bad.  This is not like having byteranges only
work for one protocol, which would leave the other broken.  Frankly,
it wouldn't even be tragic if reget didn't work for some protocols
(although it will).  reget=True should just be interpreted as optimize
via reget if possible.  One reason for it not being possible is that
it's just plain nonsensical (as with urlread).

> The more I think about it. Reget and range would have to work
> _together_. So, for example, if yum requests the byte range of an rpm's
> header and it dies half way through, the next time yum is going to
> request the same range again. urlgrabber would need to offset the range
> provided by yum with the number of bytes on disk. Takeaway is that reget
> and range should work together if they're both provided by urlgrabber.

This is a good point that I hadn't considered.  However, it's pretty
easy.  I had already imagined that reget would be based on the
byterange stuff (since you already did all that hard work), so I
imagined the algorithm would look something like this (forgive the
pseudo-code):

file requested
check local size (answer = N bytes)
if N = 0: don't use byterange
else:
  set range to [N:]
get using byterange

Instead, it will be this:
  
file requested
check local size (answer = N bytes)
get explicit range request BR = [M1:M2] or None
if N = 0 and BR is None: don't use byterange
else:
  if BR is None: M1 = 0; M2 = ''
  set range to [N+M1:M2]
get using byterange

					-Michael

-- 
  Michael D. Stenner                            mstenner at ece.arizona.edu
  ECE Department, the University of Arizona                 520-626-1619
  1230 E. Speedway Blvd., Tucson, AZ 85721-0104                 ECE 524G



More information about the Yum-devel mailing list