[Rpm-metadata] Request-For-Ideas: requires statistics and thoughts on making our metadata smaller

Jeff Johnson n3npq at mac.com
Fri Oct 30 21:28:54 UTC 2009


On Oct 30, 2009, at 5:00 PM, Seth Vidal wrote:

> I'm not entirely sure why I started looking at this but I started  
> looking out the number of requires in rawhide (i686) and then at  
> what provided those requires most of the time.
>
> Summary version:
> 211011 Requires in rawhide
> 71359 are provided by glibc.
>

You miss what is being tracked by looking at (and providing) only  
counting statistics.
Sure there's lots of provides that are glibc related, no argument (and  
no counting
statistics) are needed.

> 8165 packages provide all the requirements for all 23823 pkgs in the  
> distro.
>

That's about right. The figure used to be about 33%, now apparently  
~25%, largely
because most added packages are closer to the "leaves" rather than the  
core
of the
> The top 20 requirements and the top 20 providing packages are here:
> http://skvidal.fedorapeople.org/misc/top-20-requires-and-providers.txt
>

A quick browse shows what you miss. glibc has versioned symbols. The
cost of tracking versioned symbols into package dependencies is high.
I'll leave it to you to discover why you can't just collapse all that
"bloat" to simpler like
	Requires: glibc

> For any pkg which has a Requires that is provided by glibc, on  
> average that package has 7 more Requires that are provided by glibc.
>
> What this means for our metadata is that if we can find a way to  
> reduce how many duplicate glibc requirements we store in either the  
> pkgs and/or in the repodata that we can trim down our repodata size  
> by a fairly good amount.
>

Good luck!

>
> Run this script to see for yourself:
>
> http://skvidal.fedorapeople.org/misc/requires-frequency.py
>
> I think there are some reasonable assumptions we can make in our  
> repodata which might help out the size of the metadata for xfer  
> purposes and how many items we have to traverse.
>

Hint: incremental delivery of just what has changed, rather
than as currently, downloading everything all over again again
again repeatedly, is far likelier to achieve larger bandwidth
savings than any amount of fiddling and filtering.

Surely you knew that when you decided to rip rpmdb-fedora out.
Now you, not me, get to solve the distribution problem.

> I'm curious what folks think as to how we might be able to make this  
> better.
>
> It is worth noting that the major rpm-based distros all use the same  
> naming for their 'glibc' pkg
>

Not gonna work. But go for it! Have fun!

73 de Jeff



More information about the Rpm-metadata mailing list