[Yum-devel] [PATCH 2/2] Add new urlgrabber option 'csum_type'.

James Antill james at fedoraproject.org
Wed Jul 6 15:27:54 UTC 2011


On Fri, 2011-07-01 at 05:09 -0400, Zdenek Pavlas wrote:
> > Do you have any stats. on this?

> CHECKSUM TIME: time in misc.checksum()
> DOWNLOAD+VERIFY: time in po.repo.getPackage() - includes the above
[...]
> Checksumming after the download:
> CHECKSUM TIME 204.069137573 ms 
> DOWNLOAD+VERIFY TIME 465.682983398 ms
> 
> It takes about 200ms to checksum 28MB of the rpm, and that seems
> constant. The download alone is about 260-290ms (local gbit lan).

 I'm pretty sure that most people aren't going to have network that is
that good :).

> Single pass:
[...]
> DOWNLOAD+VERIFY TIME 372.400045395 ms
> 
> As I see it, about 100ms of checksumming time (50%) are masked by 
> network latency.

 I see ~260ms for download, so VERIFY in single pass is ~112ms, which is
a saving of ~92ms or ~45%. That's not really "stop the world" in
absolute terms, and while it seems pretty good in relative terms. as the
download time increases this number will drop off _quickly_ (Eg. for
100mbit lan, it's only a 4.5% gain).

 The other problem is that when you start the parallel part of the
downloading changes, the IO will be happening in another process ... but
I doubt that we can do the checksum there (due to desired security
boundries).
 If we _can_ put the checksumming in the helpers, then it might be worth
the changes due to the fact we can parallelize the checksumming then
too.

> What about 'reget's?  Should already stored data be fed to the callback too?
> Could that potentially run into some problems?

 Yeh, it needs to have the data on disk fed to it (better) ... or have
it skipped entirely, or the caller won't know wth is going on.



More information about the Yum-devel mailing list