[Rpm-metadata] metadata script and sample

Jeff Johnson jbj at redhat.com
Sat Oct 4 00:01:01 UTC 2003

On Fri, Oct 03, 2003 at 06:14:55PM -0500, Jeff Licquia wrote:
> On Fri, 2003-10-03 at 17:00, Jeff Johnson wrote:
> > Jeff Licquia wrote:
> > > - Miscellaneous RPMisms.
> > >
> > ><vendor>, <group>, <BuildHost> (lowercase?), <header-range>, <color>. 
> > >Also, <size rpm="foo"> should be <size package="foo">.  If these are
> > >optional, cool.  It would be nice if they were in a namespaced <rpm>
> > >area, like Debian-isms will have to be in a namespaced <deb> area, but
> > >that's more a self-esteem thing than anything else.
> > 
> > Again I'd like to see a unified representation, as agreeing to disagree 
> > and hiding
> > essential data in private name spaces will only delay the harder 
> > semantic interpretation
> > issues of package metadata.
> I don't think it's possible to escape the conclusion that RPM and Debian
> packages store and require different metadata, and I don't think it's
> good to insist on shoehorning metadata into an artificial common set.  I
> also like the flexibility of namespaces.
> But if there's really a lot of opposition to namespaces, then I'd be
> content for RPM-specific and Debian-specific metadata to just be
> optional.

I'm not opposed to name spaces at all. I am reluctant to see duplicate
(or nearly so) copies of important data like dependencies get split
into private name spaces unnecessarily.

> > > - Versioning and dependencies.
> > >
> > >The versioning and dependency syntax is a little weird.  Its only real
> > >deficiency is that it doesn't seem to handle or-relationships, but it
> > >seems to me that it could be done better overall.
> > 
> > Yes, rpm packages lack a way to specify a logical or for dependency 
> > relations,
> > perhaps always will.
> > 
> > That does not prevent designing a metadata representation that can 
> > express a logical
> > or-relation. A sensible heuristic like
> >     Most likely (i.e. "important") relation first, please.
> > would probably permit a common interoperable representation of dependencies.
> That's actually Debian standard practice (may even be policy, but I
> can't remember offhand).

Great! If rpm has made it this far w/o logical or, I suspect that the
heuristic of ignoring other alternatives will simply "work" for rpm
based common metadata. I do think it's important to have the ability
to express alternative dependencies in common metadataXML even if not
used Right Now.

> > >Versions aren't three things; they're three parts of one thing.  So, it
> > >seems to me that they should be expressed in a single entity, with the
> > >components as entity attributes.  "epoch" and "release" should also be
> > >optional.  So: 
> > 
> > There's a rul that missing epoch == epoch: 0 going to be needed, and the 
> > extension missing
> > release == release 0 that would be needed.
> Well, maybe.  Specific package formats could (probably will have to)
> have more stringent requirements than the file format.  So, RPMs might
> require epochs and releases in their versions, and Debian may forbid the
> use of RPM capability, file, or symbol dependencies.

Nah, I just point out that rpm has a strict comparison order, and so needs
defined values in order to perform the comparison. If you wish Epoch
and Release to be optional (which is OK), then there needs to be a well
defined default value so that comparison is possible. Just skipping
the test won't work any more.

> > Actually, I'd claim that dependencies are a 4-tuple, not a 3-tuple. The 
> > 3-tuple identifies
> > a point on a single dimension, while the logical comparison specifies an 
> > inequality.
> > 
> > One might want to include the name as well, making a dependency a 5-tuple of
> > {Name,Epoch,Version,Release,Flags} where Flags determines the inequality.
> Strictly speaking, I'm talking only about versions, not dependencies.
> Versions show up in package definitions as well as in dependencies. 
> Right now, we have a fundamental disconnect between how we declare the
> version of a package and how we declare the version in a package
> dependency:
> <epoch>0</epoch>
> <version>4.2</version>
> <release>3</release>
> vs.
> <entry name="foo" flags="LE" epoch="0" version="4.2" release="3"/>
> We're treating the exact same data as attributes in one place, and as
> entities in another.  That, I think, is a poor way of representing data.

If you are suggesting that package NEVR (N == name, E == epoch, etc)
might be expressed in the same manner as other data in dependencies, you
are correct. I see no problem, basically because rpm does not use
the package NEVR for anything, each package has a Provides: of
it's own NEVR, essentially what you (might be?) suggesting.

> In my proposal, version data is expressed exactly the same way
> everywhere:
> <version epoch="0" main="4.2" release="3"/>
> vs.
> <entry>
>   <packagedep name="foo">
>     <version epoch="0" main="4.2" release="3"/>
>     <comparison type="less-than"/>
>   </packagedep>
> </entry>
> See how the <version> entities are exactly the same?  That's the kind of
> thing that makes parsing simpler.  In implementing this, I might create
> a Version object, complete with comparison functions.  The same could be
> done with the former, but it would take a lot more work.
> You do have a good point regarding the rationalization of comparison and
> name in dependency expressions, though.

Presumably either a 4-tuple or 5-tuple construct of {N,E,V,R,F} is what
could be used most generally?

> > Is there really a need for differentiating file/package/other 
> > dependencies like this?
> > file dependencies are easy, they always start with '/' anyways. I can 
> > possibly see
> > a need to mark a package dependency differently so that it was easier 
> > for dpkg to
> > distinguish during xml parse; otherwise dependencies are pretty much 
> > strings.
> > 
> > Well there are foo.so soname dependencies, and perl(yadda) dependencies, 
> > but all
> > are just strings from a rpm POV.
> It's not really necessary.  I put them in as one idea of how to handle
> different types.  From a Debian POV, it doesn't matter, since we only
> have package dependencies.
> I do believe that more explicitness is better than less.  Inferring the
> dependency type from the context seems dangerous to me.  But you could,
> of course, make it an entity attribute instead of a separate entity.

In rpm there is also a context where a dependency is used, as in this dependency
expresses a need of, say, a post install script that might be added to the
"more explicit" set. Dunno, there's been very little need to expose
the dependency context externally up to now.

If you do want/need more explicit markers, then some rules for converting
dependency name strings into their types is going to be needed. For example,
all dependency tokens that start with '/' are <filedeps>. Trickier will be
sonames, but a first crack might be to apply the pattern *.so.*.

The name space dependencies are pretty easy, they always look like "foo(bar)",

Alternatively, but somewhat more radically, I'm (possibly) in favor of
collapsing all dependencies in common metadata to package dependecies.

This would require an explicit "contains" relation for mapping rpm tokens
into package names, possibly with logical or to handle multiple provides.

I'm pretty sure that would "work", and there is a vast simplification possible,
but the "contains" relation would have to be vetted carefully by rpm types,
prolly outside the scope of this mailing list.

I'm quite sure this would break a lot of rpm based code.

> > > - <pkgid> should be <checksum type="md5">
> > >
> > >This would allow us to add as many other checksum algorithms as we
> > >wanted.
> > 
> > What is missing here is the semantic interpretation for "pkgid", i.e. 
> > what blob
> > is the digest calculated from. That can be dealt with outside of XML.
> > 
> > In fact, I would claim that the pkgid should not have any qualifiers 
> > whatsoever other
> > than bit/byte lengt and a promise of sufficiently unique.
> > 
> > Building in a checksum type prevents other identifiers that are 
> > sufficiently unique.
> > Perhaps a single type identifier might be needed for extensibility, but 
> > it should not
> > directly refernce the type of digest, but rather the type of the identifier.
> > 
> > OTOH, renaming to filechecksum rather than pkgid is perfectly adequate for
> > what Seth has proposed, and then a type might be perfectly reasonable.
> > 
> > I cannot disambiguate the two interpretations without a hint of what is 
> > intended,
> > am groping blindly with my interpretation of "pkgid".
> OK.  I'm probably just confusing what pkgid means in an RPM context,
> then.

pkgid is a unique identifier for all rpm packages. It happens to be
the MD5 digest of the header+payload, and so is 16 bytes long in a "pkgid"
context. The use is what is important. pkgid is a unique identifier, not
an integrity verification means, in this context. Subtle, but important.

> Whole-file checksums are important, though, as well as algorithmic
> flexibility for those checksums.

checksums are important, whole file checksums on rpm packages can/will fail
if/when packages are signed or resigned. No matter what, the goal of a
checksum or signature should be validation of integrity, but an alternative
means, like using the MD5 digest of header+payload may prove to be a more
easily managed invariant than whole file checksums.

JMHO, whole file checksums can be lived with, just don't call the tag "pkgid",

73 de Jeff

Jeff Johnson	ARS N3NPQ
jbj at redhat.com (jbj at jbj.org)
Chapel Hill, NC

More information about the Rpm-metadata mailing list