[Rpm-metadata] metadata layout problems and some history

Anders F Björklund afb at algonet.se
Sun Aug 15 13:45:30 UTC 2010


Panu Matilainen wrote:

> AFAICT smart, zypp an apt all would just want a raw dump of the  
> relevant data in a format that's lightweight to read in once to  
> generate their own native data, and at least in case of smart and  
> apt, can be indexed (eg jump to file offset X to access the details  
> that aren't included in the package cache). Yes, dejavu on yum- 
> metadata-parser, which was the yum equivalent of "internal cache  
> creation", only the sqlite format it creates got added (as an  
> extension) to repomd.

For smart the "easiest" is to use shelve, which is pickle + anydbm.
It is indexed and loadable, but it is also specific to python...

So the repodata is parsed, XML or SQL doesn't matter all that much,
and an internal cache is created. (Ultimately using Python objects.)

Indexed rpm metadata isn't needed for the package dependency solving,
but needed later when requiring info or pathlists or changelog data.

Both indexing XML and querying SQL "worked" in that they saved having
to do yet another copy of the repodata. (Smart doesn't create repos.)

> XML works fine for the raw dump part, except that it's hideously  
> bloated format for what it's used for and expensive to parse  
> (dejavu yum-metadata-parser again), and isn't good for indexing or  
> searching.
> Heck, a dumb tagged plain-text file would be much better suited for  
> apt (and I assume smart). I've vague memories of Suse actually  
> having such a format at some point before switching to repodata.

Yes, the deb tagfiles are doing better than rpm metadata, for smart.
And the rpm headerlists (like apt had) were "better" for indexing etc.

I think urpmi currently has a mix of RPM *and* XML for the media_info,
but don't have all the details as it hasn't been merged in smart yet:

MD5SUM
hdlist.cz
synthesis.hdlist.cz
info.xml.lzma
files.xml.lzma
changelog.xml.lzma

Except for the difference in key used (yum "pkgid" vs. urpmi "fn"),
I think non-primary metadata is rather similar. The primary differ.

But for Smart it all starts with the packages, except when talking
about add-on data such as comps/tasks or update_info/descriptions.

--anders



More information about the Rpm-metadata mailing list