[Yum-devel] [PATCH 2/2] Add new urlgrabber option 'csum_type'.
James Antill
james at fedoraproject.org
Thu Jun 30 17:23:55 UTC 2011
On Thu, 2011-06-30 at 19:07 +0200, Zdeněk Pavlas wrote:
> When not 'None', urlgrabber checksums the downloaded files with specified
> algorithm. This is cheaper than reopening the file later, esp. for large ones.
> hexdigest is then made available to the 'checkfunc'.
Do you have any stats. on this?
> diff --git a/urlgrabber/grabber.py b/urlgrabber/grabber.py
> index 8e5ea3f..3f6069c 100644
> --- a/urlgrabber/grabber.py
> +++ b/urlgrabber/grabber.py
> @@ -259,6 +259,12 @@ GENERAL ARGUMENTS (kwargs)
> What type of name to IP resolving to use, default is to do both IPV4 and
> IPV6.
>
> + self.csum_type = None
> +
> + What type of checksumming should be made on data being downloaded.
> + If not null and supported, the callback object to the "checkfunc"
> + function is augumented with an additional ".csum" attribute.
> +
>
> RETRY RELATED ARGUMENTS
>
> @@ -872,6 +878,7 @@ class URLGrabberOptions:
> self.size = None # if we know how big the thing we're getting is going
> # to be. this is ultimately a MAXIMUM size for the file
> self.max_header_size = 2097152 #2mb seems reasonable for maximum header size
> + self.csum_type = None # no checksum by default
>
> def __repr__(self):
> return self.format()
> @@ -1018,6 +1025,9 @@ class URLGrabber(object):
> obj = CallbackObject()
> obj.filename = filename
> obj.url = url
> + # let the checkfunc() know the checksum
> + try: obj.csum = fo._hash.__self__.hexdigest()
This:
fo._hash.__self__.hexdigest()
...makes me twitch :).
> + except AttributeError: pass
> apply(cb_func, (obj, )+cb_args, cb_kwargs)
> finally:
> fo.close()
> @@ -1105,6 +1115,11 @@ class PyCurlFileObject(object):
> self._error = (None, None)
> self.size = 0
> self._hdr_ended = False
> + # checksum while downloading
> + self._hash = lambda buf: None
> + if opts.csum_type:
> + import hashlib
We don't generally import outside the top level, and esp. not places
like this that will be called a lot.
> + self._hash = hashlib.new(opts.csum_type).update
> self._do_open()
>
>
> @@ -1132,6 +1147,7 @@ class PyCurlFileObject(object):
>
> self._amount_read += len(buf)
> self.fo.write(buf)
> + self._hash(buf)
Just do:
fo._hash.hexdigest()
self._hash = None
self._hash = hashlib.new(opts.csum_type)
if self._hash: self._hash.update(buf)
...also using hashlib directly sucks, not least of which because it
doesn't exist on RHEL-5 :).
There's a giant pile of compat. stuff in yum/misc.py to deal with it,
of course we can't import that ... which sucks.
My first instinct is that instead of doing it this way we could just
have a generic "csum" member ... and if it's not None, we call update on
it. Then callers can pass in a hashlib.new() or a yum.misc.Checksums()
etc.
More information about the Yum-devel
mailing list