[Yum] createrepo

Robert G. Brown rgb at phy.duke.edu
Mon Sep 19 14:51:45 UTC 2005


An odd question, then some crack.  When setting up a yum-based repo, the
accepted filesystem structure appears to be some combination of
releasever and basearch, e.g. for FC:

   myrepo/3/i386
   myrepo/3/x86_64
   myrepo/4/i386
   myrepo/4/x86_64
   ....

Examining locally available repos, it appears that the customary place
to run createrepo is on this directory, where it creates
./repodata/*.xml[.gz] as a crossreference.

However, examining the metadata it creates in e.g. primary.xml.gz, it
appears that it includes arch, so that basearch is in some sense
redundant.  It also does NOT include metadata tags for either
distribution or release version, so the only way to organize this is via
filesystem path.

The question:  If one runs createrepo at the releasever level -- on
myrepo/4 in the example above -- can one use a single repo URL for both
i386 and x86_64 basearchs, and will yum untangle the right default arch
to install at its end?  Secondary question is:  Even if it is possible
and should "work", is it a bad idea to do so?




The crack: As I manipulate repositories with many distributions, release
numbers, "names" (levels, purposes) and architectures represented, this
seems to me to be very clumsily implemented.  To put it another way,
although it is very sensible to create a filesystem layout that is a
kind of tree, it bothers me a bit that the layout itself becomes the
explicit means used to dereference the contents.  This is not the Sufi
way...

The whole point of createrepo seems to be in large part to transform the
physical layout of a repository into an abstraction layer encoded in
metadata, and deal with the fact that in some cases the actual rpms will
be in ./RPMS and in others in ./. and in still others in two or three
subdirectories organized some way that makes sense to the repo creator.

I will say that in time it gets to be a bit of a PITA, when building
rpms and packing them into a repo tree, to have to try to automate the
filesystem based abstraction, and sometimes there is both a temptation
to and a freedom to do things "differently" and e.g. use
distro/basearch/releasever or basearch/distro/releasever (both of which
would work perfectly well -- basically any permutation that follows
distro by releasever will work).

What would truly be lovely, as a possible future feature request, is to
add some upper level tags to the supported metadata to replace
(transparently, if possible) the "organization" currently being managed
de facto by filesystem layout and hence differently at the while of the
repo manager. Probably "distribution" and "releasever" in addition to
"basearch" and maybe "level" where level might be e.g. base, updates,
extras, or anything else one likes and is currently using as a directory
name in a filesystem layout.  createrepo might then run recursively in
such a way that a single toplevel repo descriptor might work for an
entire distribution, or even a single URL could work for ALL
distributions supported thereupon.  yum could then match up the correct
paths to the releasever, basearch, level(s) (still set in the repo
descriptor file but in a list of desired levels in a single file) and do
its thing.

Note that perhaps this isn't quite the crack that it might sound, which
is why I'm throwing it out there -- often the best organization for a
data structure isn't apparent the first two or three times one takes a
stab at creating one.  I've seen our repos at Duke go through a couple
of permutations of the above already at the filesystem level.  Yum has
gone through a few major makeovers where it does much more with metadata
and xmlish tags than it originally did.  It might be reasonable to think
about what to do in the next one, so I offer this up.

As an example, consider the following as a possible template for a
SINGLE distribution client-side repo:

[Fedora]
name=Linux at DUKE - Fedora
baseurl=http://install.linux.duke.edu/pub/linux/Fedora
# uncomment the last only if duke.edu
levels=base updates extras distrib # duke
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-fedora file:///etc/pki/rpm-gpg/RPM-GPG-KEY-Linux-at-DUKE file:///etc/pki/rpm-gpg/RPM-GPG-KEY

No basearch or releasever as the system knows what they are from the
original installation (in fact, they are variables in the repo file set
from that information).  createrepo is ALSO aware of them on the build
host (or via a command line override) and has tagged them appropriately
at the repo metadata level on the server side.  createrepo set a
"level" tag, by default a toplevel directory name (e.g. Fedora/base,
Fedora/updates...)

This would simplify repo management enormously because typically one
would need one or at most two repo descriptors per repo SERVER.  Even
the distro name could be a tag, so one could avoid explicitly putting in
RHEL, Centos, Fedora.  I don't think that any information is lost or
that this change makes anything enormously incompatible, and of course
all of the tag entries could be overridden in the config file or at the
yum command line, but it basically gets computers to doing the automatic
part fo the work so that humans don't have to hand edit and set things
millions of times (literally!).  It could ALMOST be done by just
changing the way repo files are parsed, at least if there was path
matching metadata available from serverside.

A last reason to consider encapsulating this sort of data in xml instead
of filesystem layout is that (strictly hierarchical) xml tags are
relatively easy to build automated or GUI tools to manage, where of
course the current organize-directories-as-you-like schema makes GUI
management tools nearly impossible to build (at least without creating
and enforcing a de facto standard in repository layout, something that
XML does by its very nature).  By making all of this tagged metadata in
e.g. a .yumrepo.xml file in each subdirectory processed by createrepo
(recursively applied) the filesystem layout is completely divorced from
a repo builder application -- any tree can be graphically traversed and
its contents tagged according to the desires or reasoning of the repo
maintainer where the TAG hierarchy need not reflect the FILESYSTEM
hierarchy but is rather precisely what yum needs to efficiently resolve
the paths to the desired rpms.

Just a thought.

   rgb
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.baseurl.org/pipermail/yum/attachments/20050919/64e624bd/attachment-0001.pgp 


More information about the Yum mailing list