[Yum-devel] [PATCH 2/2] Add new urlgrabber option 'csum_type'.

Zdenek Pavlas zpavlas at redhat.com
Fri Jul 1 09:09:59 UTC 2011

> Do you have any stats. on this?

# for n in `seq 4` do;
#  sync
#  yum install thunderbird -y
#  yum clean packages
#  rpm -e thunderbird; done

CHECKSUM TIME: time in misc.checksum()
DOWNLOAD+VERIFY: time in po.repo.getPackage() - includes the above

Checksumming after the download:
CHECKSUM TIME 208.12702179 ms 
DOWNLOAD+VERIFY TIME 488.519906998 ms
CHECKSUM TIME 206.007003784 ms 
DOWNLOAD+VERIFY TIME 475.650072098 ms
CHECKSUM TIME 204.250097275 ms 
DOWNLOAD+VERIFY TIME 473.475933075 ms
CHECKSUM TIME 204.069137573 ms 
DOWNLOAD+VERIFY TIME 465.682983398 ms

It takes about 200ms to checksum 28MB of the rpm, and that seems
constant. The download alone is about 260-290ms (local gbit lan).

Single pass:
DOWNLOAD+VERIFY TIME 384.808063507 ms
DOWNLOAD+VERIFY TIME 375.88095665 ms
DOWNLOAD+VERIFY TIME 465.562820435 ms
DOWNLOAD+VERIFY TIME 372.400045395 ms

As I see it, about 100ms of checksumming time (50%) are masked by 
network latency.  On a slower network, it'd be probably close to 100%.
Using best values in both cases (372ms vs 465ms) the net win is 20%.

> fo._hash.__self__.hexdigest()
> ...makes me twitch :).

Yep, not nice at all.  But replacing a member function with a dummy
callback is much easier than crafting a dummy object.

> My first instinct is that instead of doing it this way we could just
> have a generic "csum" member ... and if it's not None, we call update on
> it. Then callers can pass in a hashlib.new() or a yum.misc.Checksums()
> etc.

That's reasonable.  I'd probably rename 'opts.csum_type' to 'opts.csumfunc',
and whoever needs to checksum downloaded data, he'd put a callback there.
That would also need no changes at the 'checkfunc' side.

What about 'reget's?  Should already stored data be fed to the callback too?
Could that potentially run into some problems?


More information about the Yum-devel mailing list