[Yum] metadata compression

Joshua Bahnsen archrival at gmail.com
Sun Apr 19 21:22:50 UTC 2009


I am creating repository data based on ALL rpms available to a specific Red
Hat channel (6000 or so per channel)

rhel-i386-as-3
rhel-i386-es-3
rhel-i386-ws-3
rhel-i386-as-4
rhel-i386-es-4
rhel-i386-ws-4
rhel-i386-client-5
rhel-i386-server-5
rhel-x86_64-as-3
rhel-x86_64-es-3
rhel-x86_64-ws-3
rhel-x86_64-as-4
rhel-x86_64-es-4
rhel-x86_64-ws-4
rhel-x86_64-client-5
rhel-x86_64-server-5

With rhel-i386-as-4, other.xml is nearly 300 MB uncompressed, with gzip it
is 66 MB, with lzma on max compression is 2.4 MB.

I'm personally not even concerned with storing the data in sqlite, I'm
trying to limit network bandwidth. If yum had the capability to read in lzma
compressed metadata it would accomplish this. Is the compression type of the
metadata directly tied to the compression of the sqlite DB?

I will state I have been using 7z for the compression and not lzma from the
SDK, 7z has much better results.

If it isn't doable or make much sense I have alternate ways to accomplish
this outside of yum land.

On Sun, Apr 19, 2009 at 1:51 PM, James Antill <james-yum at and.org> wrote:

> Joshua Bahnsen <archrival at gmail.com> writes:
>
> > I am keeping track of 16 RHEL channels, using createrepo with the
> standard
> > gzip I am totaling 1.4 GB of metadata.
>
>  How many arches is that for?
>

>
> > Compressing those same XML documents
> > with LZMA yields a total of 140 MB. That's 10x savings overall, I think
> > that's worth a look.
>
>  Well, again, it'd depend on what it did _for the .sqlite_ files. As
> shipping the .xml files to the client machines is suboptimal in many
> ways.
>
>  Doing some quick tests:
>
>  CentOS-5
>  ---------
>  primary.xml          = 5.3M
>  primary.xml.gz       = 888K
>  primary.xml.bz2      = 584K
>  primary.xml.lz       = 540K
>
> ...so I'm not sure how you get 10x. Although for the .sqlite data it
> seems to do a little better:
>
>  Fedora-rawhide
>  --------------
>  primary.sqlite       = 37M
>  primary.sqlite.gz    = 12M
>  primary.sqlite.bz2   = 8.5M
>  primary.sqlite.lz    = 6.8M
>
>  filelists.sqlite     = 66M
>  filelists.sqlite.bz  = 15M
>  filelists.sqlite.bz2 = 13M
>  filelists.sqlite.lz  = 11M
>
>  other.sqlite         = 19M
>  other.sqlite.gz      = 6.5M
>  other.sqlite.bz2     = 4.6M
>  other.sqlite.lz      = 2.8M
>
> ...which implies somewhere in the 25-35% savings range, but I doubt
> that's enough (on it's own) given the CPU/code requirements.
>
> --
> James Antill -- james at and.org
> _______________________________________________
> Yum mailing list
> Yum at lists.baseurl.org
> http://lists.baseurl.org/mailman/listinfo/yum
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.baseurl.org/pipermail/yum/attachments/20090419/6603ecbb/attachment-0001.htm>


More information about the Yum mailing list