[Rpm-metadata] adding different compression types to createrepo

Anders F Björklund afb at algonet.se
Thu Aug 5 09:15:59 UTC 2010


James Antill wrote:
>>>> openSUSE actually uses LZMA, not XZ. Both for RPMS and for  
>>>> repodata.
>>>>
>>>> Using liblzma would handle both, but feeding xz to lzma doesn't  
>>>> work.
>>>
>>>  Does that matter? The rpm payload is inside rpm, so matters  
>>> less, and
>>> can't be changed now anyway. But for "normal" files the convention
>>> is to use .xz now ... no?
>>
>> I thought the "convention" was .gz, since that's the only choice  
>> given.
>
>  For repomd, yes .gz/.bz2 is the only std. target atm. I meant as a
> general answer it's common to use .xz but not .lzma, Eg.
> ftp://ftp.gnu.org/gnu/coreutils/ now has .gz and .xz for the latest
> releases.

Sure, but it said "in openSUSE" above, and not "in general". :-)

It doesn't matter, as when I said "LZMA" I meant either lzma/xz,
as in either of the (legacy) LZMA_Alone or the XZ file formats.

unxz and pyliblzma handles both (xzdec only handled .xz - my bad)

>> Either way it's not as much of a gain as for yum which can use the
>> .sqlite files directly. Both the .xml and .sqlite need converting,
>> into the internal format. And so far changing hasn't been worth it.
>
>  Yes, we understand yum gets a bigger speedup than zypper will because
> we use the .sqlite directly. I'm even willing to concede that a custom
> DB can be faster than sqlite (although I doubt it is  
> _significant_), for
> the data it is designed for.

I don't know much about zypper, so will let duncanmv answer that.

Was talking about Smart, which reads everything into the "cache".

>  However it seems like a very worthwhile goal, to me, to only have one
> set of MD. And for that set of MD to not require worthless conversions
> on each client. There is currently only primary_db in upstream
> createrepo, which meets those needs.

I thought that the .xml was transparently converted to .sqlite
on the client with the use of the "yum-metadata-parser" module ?

Having the .sqlite in the repodata is more like a pre-compute,
especially if you are not going to use the database afterwards.

>  So, while we aren't going to remove "primary" generation support
> tomorrow, it is very much a second class type already IMO.
>
>>>  .sqlite is about as extensible as XML, and primary/etc. have never
>>> changed (and the coming changes are just as likely to be done by
>>> adding
>>> new files).
>>
>> It's easier to add new attributes and tags, than new columns and  
>> tables.
>
>  I would disagree, you can do a single call in sqlite to see if a  
> table
> or column exists and if so it will be there for every value ... XML is
> much less conforming. At worst I'd say it's the same.

OK. I'll try to add the "Requires(hint):" to the sqlite as well.

It should only be an extra column of "hint BOOLEAN DEFAULT FALSE"

>>>  However there is a cost to having 4-8 different versions of
>>> primary/filelists/etc. ... both in createrepo time, in hosting disk
>>> space, in maintenance of all the weird code paths and in repomd.xml
>>> size
>>> (most repos. aren't using metalink, so repomd.xml is downloaded a
>>> lot).
>>
>> I'm not sure where this 4-8 number came from. We were talking  
>> about 2,
>> or 3 if you want to include the .sqlite files too (which are  
>> different).
>>
>> repomd.xml
>> primary.xml.gz (or primary.xml)
>> primary.xml.xz (or primary.xml.lzma)
>> primary.sqlite.bz2
>
>  Supporting everything, we'd have:
>
> primary.gz
> primary.lzma
> primary.xz
> primary_db.bz2
> primary_db.xz
> primary_solv.xz
> [...]

Everything ? No, the patch was to add just "primary_lzma".
(with the addition of "filelists_lzma" and "other_lzma")
There is no need to have both of .lzma and .xz, and the
.sqlite and .solv are mostly useful for yum and zypper...

The only other addition I made was to add an ".index"
file, so that one could seek a specific pkgid quickly.
(it's just a text file with "$pkgid\t\$offset\n" lines,
and to the uncompressed stream so only one index needed)

But that index file is also easy to compute afterwards.
Sample program was like 50 lines of python or something.
So for a generic repo there would be only be *two* files,
the "compat" .xml.gz and either of .xml.xz / .sqlite.bz2

primary.xml.gz
primary.xml.xz

> ...my goal (and Seth's, I think) is to have something like:
>
> mini_primary.xz
> [...]
>
> ...but, obviously that is "in the future", as no code has been written
> for any of the proposed repodata formats.


Sounds like a totally different discussion, as per thread ?

The "primary_lzma" type addition was definitely here-and-now.

--anders



More information about the Rpm-metadata mailing list