[Yum-devel] [PATCH 2/2] Add new urlgrabber option 'csum_type'.

James Antill james at fedoraproject.org
Thu Jun 30 17:23:55 UTC 2011


On Thu, 2011-06-30 at 19:07 +0200, Zdeněk Pavlas wrote:
> When not 'None', urlgrabber checksums the downloaded files with specified
> algorithm.  This is cheaper than reopening the file later, esp. for large ones.
> hexdigest is then made available to the 'checkfunc'.

 Do you have any stats. on this?

> diff --git a/urlgrabber/grabber.py b/urlgrabber/grabber.py
> index 8e5ea3f..3f6069c 100644
> --- a/urlgrabber/grabber.py
> +++ b/urlgrabber/grabber.py
> @@ -259,6 +259,12 @@ GENERAL ARGUMENTS (kwargs)
>      What type of name to IP resolving to use, default is to do both IPV4 and
>      IPV6.
>  
> +  self.csum_type = None
> +
> +    What type of checksumming should be made on data being downloaded.
> +    If not null and supported, the callback object to the "checkfunc"
> +    function is augumented with an additional ".csum" attribute.
> +
>  
>  RETRY RELATED ARGUMENTS
>  
> @@ -872,6 +878,7 @@ class URLGrabberOptions:
>          self.size = None # if we know how big the thing we're getting is going
>                           # to be. this is ultimately a MAXIMUM size for the file
>          self.max_header_size = 2097152 #2mb seems reasonable for maximum header size
> +        self.csum_type = None # no checksum by default
>          
>      def __repr__(self):
>          return self.format()
> @@ -1018,6 +1025,9 @@ class URLGrabber(object):
>                      obj = CallbackObject()
>                      obj.filename = filename
>                      obj.url = url
> +                    # let the checkfunc() know the checksum
> +                    try: obj.csum = fo._hash.__self__.hexdigest()

 This:

 fo._hash.__self__.hexdigest()

...makes me twitch :).

> +                    except AttributeError: pass
>                      apply(cb_func, (obj, )+cb_args, cb_kwargs)
>              finally:
>                  fo.close()
> @@ -1105,6 +1115,11 @@ class PyCurlFileObject(object):
>          self._error = (None, None)
>          self.size = 0
>          self._hdr_ended = False
> +        # checksum while downloading
> +        self._hash = lambda buf: None
> +        if opts.csum_type:
> +            import hashlib

 We don't generally import outside the top level, and esp. not places
like this that will be called a lot.

> +            self._hash = hashlib.new(opts.csum_type).update
>          self._do_open()
>          
>  
> @@ -1132,6 +1147,7 @@ class PyCurlFileObject(object):
>  
>              self._amount_read += len(buf)
>              self.fo.write(buf)
> +            self._hash(buf)

 Just do:

    fo._hash.hexdigest()

        self._hash = None

           self._hash = hashlib.new(opts.csum_type)

           if self._hash: self._hash.update(buf) 

...also using hashlib directly sucks, not least of which because it
doesn't exist on RHEL-5 :).
 There's a giant pile of compat. stuff in yum/misc.py to deal with it,
of course we can't import that ... which sucks.

 My first instinct is that instead of doing it this way we could just
have a generic "csum" member ... and if it's not None, we call update on
it. Then callers can pass in a hashlib.new() or a yum.misc.Checksums()
etc.



More information about the Yum-devel mailing list