[Rpm-metadata] Request-For-Ideas: requires statistics and thoughts on making our metadata smaller
Jeff Johnson
n3npq at mac.com
Mon Nov 2 15:09:46 UTC 2009
On Nov 2, 2009, at 9:50 AM, Duncan Mac-Vicar Prett wrote:
>
> This string repetition is the reason why the satsolver uses a hashed
> string
> pool which is created from the metadata (very fast). The result are
> the solv
> files.
>
Memoization (as in the satsolver) is an important reduction.
The problem with memoization used to remove data redundancy is that
one cannot
do memoization (which forces a dictionary to uniqify all strings) and
simultaneously
use a "standard" markup like XML.
If anything, there are more redundant strings in the
XML markup than the dependency content itself in rpm-metadata.
But that flaw is usually dismissed with
Comress! Compress! Compress!
And sure one can use a database like sqlite as well, but that assumes
that you have normalized data in the schema, mostly not the case
for rpm-metadata stored in a sqlite3 database.
Both memoziation (as used in *.solv) and a database (as used by yum)
also force all lookups to go through the "dictionary" to be decoded.
While that clearly "works" for vendor specific applications like
zypp and fedora specific implementations like yum, there's no clear
"better" yet.
73 de Jeff
More information about the Rpm-metadata
mailing list