[Rpm-metadata] metadata script and sample

Jeff Licquia licquia at progeny.com
Fri Oct 3 23:13:47 UTC 2003


On Fri, 2003-10-03 at 17:00, Jeff Johnson wrote:
> Jeff Licquia wrote:
> > - Miscellaneous RPMisms.
> >
> ><vendor>, <group>, <BuildHost> (lowercase?), <header-range>, <color>. 
> >Also, <size rpm="foo"> should be <size package="foo">.  If these are
> >optional, cool.  It would be nice if they were in a namespaced <rpm>
> >area, like Debian-isms will have to be in a namespaced <deb> area, but
> >that's more a self-esteem thing than anything else.
> 
> Again I'd like to see a unified representation, as agreeing to disagree 
> and hiding
> essential data in private name spaces will only delay the harder 
> semantic interpretation
> issues of package metadata.

I don't think it's possible to escape the conclusion that RPM and Debian
packages store and require different metadata, and I don't think it's
good to insist on shoehorning metadata into an artificial common set.  I
also like the flexibility of namespaces.

But if there's really a lot of opposition to namespaces, then I'd be
content for RPM-specific and Debian-specific metadata to just be
optional.

> > - Versioning and dependencies.
> >
> >The versioning and dependency syntax is a little weird.  Its only real
> >deficiency is that it doesn't seem to handle or-relationships, but it
> >seems to me that it could be done better overall.
> 
> Yes, rpm packages lack a way to specify a logical or for dependency 
> relations,
> perhaps always will.
> 
> That does not prevent designing a metadata representation that can 
> express a logical
> or-relation. A sensible heuristic like
>     Most likely (i.e. "important") relation first, please.
> would probably permit a common interoperable representation of dependencies.

That's actually Debian standard practice (may even be policy, but I
can't remember offhand).

> >Versions aren't three things; they're three parts of one thing.  So, it
> >seems to me that they should be expressed in a single entity, with the
> >components as entity attributes.  "epoch" and "release" should also be
> >optional.  So: 
> 
> There's a rul that missing epoch == epoch: 0 going to be needed, and the 
> extension missing
> release == release 0 that would be needed.

Well, maybe.  Specific package formats could (probably will have to)
have more stringent requirements than the file format.  So, RPMs might
require epochs and releases in their versions, and Debian may forbid the
use of RPM capability, file, or symbol dependencies.

> Actually, I'd claim that dependencies are a 4-tuple, not a 3-tuple. The 
> 3-tuple identifies
> a point on a single dimension, while the logical comparison specifies an 
> inequality.
> 
> One might want to include the name as well, making a dependency a 5-tuple of
> {Name,Epoch,Version,Release,Flags} where Flags determines the inequality.

Strictly speaking, I'm talking only about versions, not dependencies.

Versions show up in package definitions as well as in dependencies. 
Right now, we have a fundamental disconnect between how we declare the
version of a package and how we declare the version in a package
dependency:

<epoch>0</epoch>
<version>4.2</version>
<release>3</release>

vs.

<entry name="foo" flags="LE" epoch="0" version="4.2" release="3"/>

We're treating the exact same data as attributes in one place, and as
entities in another.  That, I think, is a poor way of representing data.

In my proposal, version data is expressed exactly the same way
everywhere:

<version epoch="0" main="4.2" release="3"/>

vs.

<entry>
  <packagedep name="foo">
    <version epoch="0" main="4.2" release="3"/>
    <comparison type="less-than"/>
  </packagedep>
</entry>

See how the <version> entities are exactly the same?  That's the kind of
thing that makes parsing simpler.  In implementing this, I might create
a Version object, complete with comparison functions.  The same could be
done with the former, but it would take a lot more work.

You do have a good point regarding the rationalization of comparison and
name in dependency expressions, though.

> Is there really a need for differentiating file/package/other 
> dependencies like this?
> file dependencies are easy, they always start with '/' anyways. I can 
> possibly see
> a need to mark a package dependency differently so that it was easier 
> for dpkg to
> distinguish during xml parse; otherwise dependencies are pretty much 
> strings.
> 
> Well there are foo.so soname dependencies, and perl(yadda) dependencies, 
> but all
> are just strings from a rpm POV.

It's not really necessary.  I put them in as one idea of how to handle
different types.  From a Debian POV, it doesn't matter, since we only
have package dependencies.

I do believe that more explicitness is better than less.  Inferring the
dependency type from the context seems dangerous to me.  But you could,
of course, make it an entity attribute instead of a separate entity.

> > - <pkgid> should be <checksum type="md5">
> >
> >This would allow us to add as many other checksum algorithms as we
> >wanted.
> 
> What is missing here is the semantic interpretation for "pkgid", i.e. 
> what blob
> is the digest calculated from. That can be dealt with outside of XML.
> 
> In fact, I would claim that the pkgid should not have any qualifiers 
> whatsoever other
> than bit/byte lengt and a promise of sufficiently unique.
> 
> Building in a checksum type prevents other identifiers that are 
> sufficiently unique.
> Perhaps a single type identifier might be needed for extensibility, but 
> it should not
> directly refernce the type of digest, but rather the type of the identifier.
> 
> OTOH, renaming to filechecksum rather than pkgid is perfectly adequate for
> what Seth has proposed, and then a type might be perfectly reasonable.
> 
> I cannot disambiguate the two interpretations without a hint of what is 
> intended,
> am groping blindly with my interpretation of "pkgid".

OK.  I'm probably just confusing what pkgid means in an RPM context,
then.

Whole-file checksums are important, though, as well as algorithmic
flexibility for those checksums.




More information about the Rpm-metadata mailing list