[Rpm-metadata] Request-For-Ideas: requires statistics and thoughts on making our metadata smaller

Duncan Mac-Vicar Prett dmacvicar at suse.de
Mon Nov 2 14:50:49 UTC 2009


On Friday 30 October 2009 22:00:36 Seth Vidal wrote:
> I'm not entirely sure why I started looking at this but I started looking
> out the number of requires in rawhide (i686) and then at what provided
> those requires most of the time.
> 
> Summary version:
> 211011 Requires in rawhide
> 71359 are provided by glibc.
> 
> 8165 packages provide all the requirements for all 23823 pkgs in the
> distro.
> 
> The top 20 requirements and the top 20 providing packages are here:
> http://skvidal.fedorapeople.org/misc/top-20-requires-and-providers.txt
> 
> For any pkg which has a Requires that is provided by glibc, on average
> that package has 7 more Requires that are provided by glibc.

This string repetition is the reason why the satsolver uses a hashed string 
pool which is created from the metadata (very fast). The result are the solv 
files.

In memory the library operates on these unique ids. We were thinking about 
generating the .solv files on the server side (same sqlite approach yum takes) 
and download them, but we lack a versioning scheme for that.

For example, the packman 3rd party repository is about 1.8M of compressed rpm-
md. While the uncompressed metadata is 11M, the solv file is about 1.9M. The 
advantage is that is can be loaded in memory as it is and the solver can 
operate directly on the solvable data. The attribute data is loaded on demand, 
keeping descriptions and any non-solvable attribute out of memory if not 
needed.

The solv file from /var/lib/rpm is 3.9M, while /var/lib/rpm is 131M

-- 
Duncan Mac-Vicar P. - Engineering Manager, YaST
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nuernberg)



More information about the Rpm-metadata mailing list