[Yum-devel] urlgrabber: timestamp check

Michael Stenner mstenner at linux.duke.edu
Wed Mar 9 17:28:26 UTC 2005


On Mon, Mar 07, 2005 at 12:44:45AM -0500, seth vidal wrote:
> > The dealio is that if you have a 1000 byte remote file and a 1000 byte
> > local file, urlgrabber will try and construct a zero-length byterange
> > request which then barfs.  We might, for reget only, consider
> > special-casing the zero-length request to simply have it do nothing.
> > Thoughts, Ryan?
> > 
> > I'm open to considering if_newer_than options or somesuch, but I'd
> > want to sort out some of the issues:  urlgrab only?  what if there's
> > no local file?, etc.
> 
> Okay,  I don't think I ever answered this. The short version of what I
> want is an easy item to add to the top of:
> yum/repos.Repository.getRepoXML() to see if we need to bother getting
> the file.

OK, I'm attaching both a patch to grabber.py and a little test program
to demonstrate how one might do this.  I'd really like input from you
and Ryan about whether this makes sense.  For the newly defined Error
number (which is for HTTPError), I've included special "code" and
"exception" attributes, which seems a little kludgy.  However, I do
kinda like returning a bit more information.  Before, the same
exception got raised as IOError and the code/message were just encoded
in the string.  That's OK to a user but it makes it hard to do things
like this.

The only other way I thought of to solve the problem was to allow
callers to include custom urllib2 handlers.  That would really allow a
lot more flexibility, but I'd want to do it right, and that might take
a little thought.

Also note that this patch is currently causing some unit tests to
fail, which I haven't looked into yet.  I haven't committed it because
I want to get your opinions.

					-Michael

P.S.  To use the test program, just touch a file named reference-comp
and give it a new (today) timestamp or an old (say, 2000) timestamp.

-- 
  Michael D. Stenner                            mstenner at ece.arizona.edu
  ECE Department, the University of Arizona                 520-626-1619
  1230 E. Speedway Blvd., Tucson, AZ 85721-0104                 ECE 524G
-------------- next part --------------
Index: urlgrabber/grabber.py
===================================================================
RCS file: /home/groups/urlgrabber/cvs-root/urlgrabber/urlgrabber/grabber.py,v
retrieving revision 1.39
diff -u -r1.39 grabber.py
--- urlgrabber/grabber.py	3 Mar 2005 00:54:23 -0000	1.39
+++ urlgrabber/grabber.py	9 Mar 2005 17:14:34 -0000
@@ -372,6 +372,7 @@
         10   - Byte range requested, but range support unavailable
         11   - Illegal reget mode
         12   - Socket timeout.
+        13   - HTTPError (includes .code and .exception attributes)
 
       MirrorGroup error codes (256 -- 511)
         256  - No more mirrors left to try
@@ -887,7 +888,12 @@
         except ValueError, e:
             raise URLGrabError(1, _('Bad URL: %s') % (e, ))
         except RangeError, e:
-            raise URLGrabError(9, _('%s') % (e, ))
+            raise URLGrabError(9, str(e))
+        except urllib2.HTTPError, e:
+            new_e = URLGrabError(13, str(e))
+            new_e.code = e.code
+            new_e.exception = e
+            raise new_e
         except IOError, e:
             if hasattr(e, 'reason') and have_socket_timeout and \
                    isinstance(e.reason, TimeoutError):
@@ -897,7 +903,7 @@
         except OSError, e:
             raise URLGrabError(5, _('OSError: %s') % (e, ))
         except HTTPException, e:
-            raise URLGrabError(7, _('HTTP Error (%s): %s') % \
+            raise URLGrabError(7, _('HTTP Exception (%s): %s') % \
                             (e.__class__.__name__, e))
         else:
             return (fo, hdr)
-------------- next part --------------
#!/usr/bin/python
import os, time, stat

from urlgrabber.grabber import URLGrabber, URLGrabError

g = URLGrabber()
fn = 'reference-comp'
try:
    s = os.stat(fn)
except OSError:
    http_headers=()
else:
    mod_time = s[stat.ST_MTIME]
    ftime = time.strftime("%a, %d %b %Y %H:%M:%S +0000", time.gmtime(mod_time))
    http_headers=(('If-Modified-Since', ftime),)

try:
    g.urlgrab('http://www.linux.duke.edu/projects/urlgrabber/test/reference',
              None, http_headers=http_headers)
    print 'got it!'
except URLGrabError, e:
    if not (e.errno == 13 and e.code == 304): raise
    print "didn't need to get it!"


More information about the Yum-devel mailing list