[Yum] metadata compression

Mon Apr 20 05:46:03 UTC 2009

Ville Skyttä <ville.skytta at iki.fi> writes:

> On Sunday 19 April 2009, James Antill wrote:
>
> [sqlite bz2 vs lzma/xz]
>> ...which implies somewhere in the 25-35% savings range, but I doubt
>> that's enough (on it's own) given the CPU/code requirements.
>
> Regarding CPU requirements, xz/lzma should be much better on metadata consumer 
> boxes than bzip2, and somewhat more memory intensive but I doubt this would 
> matter much if any at all as long as lzma compression levels are kept at sane 
> values.

 The 25-35% savings were on .sqlite is at -9 ... what do you mean by
"sane values" here. I'm not outright opposed to doing something like
this, which is why I spent some of my weekend looking what the result
might be (I'm likely to spend a lot of effort for 10x resource
efficiency, much less so for 1.2x).
 If someone can provide real stats. that show a big difference, feel
free to post them.

>  It is however quite a bit heavier on the metadata producer boxes, 
> both CPU and memory wise: http://tukaani.org/lzma/benchmarks .  Whether that's 
> a problem depends on the scenario but I'm sure people wouldn't mind being 
> given the choice;

 Sure, people like choice in lots of things, but those choices have to
be paid for. For instance some people like to choose to access rawhide
from apt, or a random RHEL-5 version of yum.
 So do we now keep N versions of all the .sqlite files, for each
compression flavor and allow people to choose how many N versions
(forwards and backwards) to generate? -- Content-Encoding didn't work
so well with that much choice.

> e.g. even if the CPU/memory requirements would be a problem 
> for boxes composing something large like Fedora Rawhide all the time, at least 
> for immutable final release repos it should be doable, ditto for many 
> scenarios between these extremes.

 Exactly the opposite, IMNSHO. I download rawhide metadata a couple of
times a week ... I download "fedora" metadata somewhere between 0 and
1 times. I'd be happy with no compression at all there, I think.

> Regarding code requirements, if yum devs don't feel like implementing it, I'm 
> sure the code will just magically appear somewhere if there's a clear green 
> light given by the yum devs and when xz and its python bindings reaches a 
> stable release.

 It's not like we know what the code will look like, although we can
imagine. For instance if you think it's adding an import or two and
doing some code in yum like:

if url.endswith(".lz"): uncompress_lzip()
if url.endswith(".bz2"): uncompress_bzip2()
if url.endswith(".gz"): uncompress_gzip()

...then it's unlikely I'd commit it, because that's just the tip of
the iceberg.

-- 
James Antill -- james at and.org