[Rpm-metadata] adding different compression types to createrepo

James Antill james at fedoraproject.org
Wed Aug 4 14:42:44 UTC 2010


On Tue, 2010-08-03 at 22:09 +0200, Anders F Björklund wrote:
> James Antill wrote:
> 
> >> openSUSE actually uses LZMA, not XZ. Both for RPMS and for repodata.
> >>
> >> Using liblzma would handle both, but feeding xz to lzma doesn't work.
> >
> >  Does that matter? The rpm payload is inside rpm, so matters less, and
> > can't be changed now anyway. But for "normal" files the convention  
> > is to use .xz now ... no?
> 
> I thought the "convention" was .gz, since that's the only choice given.

 For repomd, yes .gz/.bz2 is the only std. target atm. I meant as a
general answer it's common to use .xz but not .lzma, Eg.
ftp://ftp.gnu.org/gnu/coreutils/ now has .gz and .xz for the latest
releases.

> Either way it's not as much of a gain as for yum which can use the
> .sqlite files directly. Both the .xml and .sqlite need converting,
> into the internal format. And so far changing hasn't been worth it.

 Yes, we understand yum gets a bigger speedup than zypper will because
we use the .sqlite directly. I'm even willing to concede that a custom
DB can be faster than sqlite (although I doubt it is _significant_), for
the data it is designed for.
 However it seems like a very worthwhile goal, to me, to only have one
set of MD. And for that set of MD to not require worthless conversions
on each client. There is currently only primary_db in upstream
createrepo, which meets those needs.

 So, while we aren't going to remove "primary" generation support
tomorrow, it is very much a second class type already IMO.

> >  .sqlite is about as extensible as XML, and primary/etc. have never
> > changed (and the coming changes are just as likely to be done by  
> > adding
> > new files).
> 
> It's easier to add new attributes and tags, than new columns and tables.

 I would disagree, you can do a single call in sqlite to see if a table
or column exists and if so it will be there for every value ... XML is
much less conforming. At worst I'd say it's the same.

> >  However there is a cost to having 4-8 different versions of
> > primary/filelists/etc. ... both in createrepo time, in hosting disk
> > space, in maintenance of all the weird code paths and in repomd.xml  
> > size
> > (most repos. aren't using metalink, so repomd.xml is downloaded a  
> > lot).
> 
> I'm not sure where this 4-8 number came from. We were talking about 2,
> or 3 if you want to include the .sqlite files too (which are different).
> 
> repomd.xml
> primary.xml.gz (or primary.xml)
> primary.xml.xz (or primary.xml.lzma)
> primary.sqlite.bz2

 Supporting everything, we'd have:

primary.gz
primary.lzma
primary.xz
primary_db.bz2
primary_db.xz
primary_solv.xz
[...]

...my goal (and Seth's, I think) is to have something like:

mini_primary.xz
[...]

...but, obviously that is "in the future", as no code has been written
for any of the proposed repodata formats.

-- 
James Antill - james at fedoraproject.org
http://yum.baseurl.org/wiki/whatsnew/3.2.28
http://yum.baseurl.org/wiki/YumBenchmarks
http://yum.baseurl.org/wiki/YumHistory


More information about the Rpm-metadata mailing list