[Yum-devel] Breaking yum repository metadata
Hedayat Vatankhah
hedayat at grad.com
Tue Oct 20 09:31:42 UTC 2009
Hi all,
I'd like to create a prototype of a new repository metadata, and it
would be nice if you let me know about any negative points you see in
the proposal (like the previous one with security concerns):
In the current implementation, the repository's primary database
contains a considerable amount of information about each package, and
most of such information won't be used by many users. It wastes
bandwidth, which gets worse as the size of the repository grows. For
example, the current Fedora repository primary database is about 12MB in
compressed form and 47MB in normal form. There are still many users for
which downloading 12MB of data is not fast, and as currently yum doesn't
resume downloading metadata files, it could be really frustrating for
users with poor internet connection.
IMHO, it would be nice if users download only what they really need, not
the complete repository data. So, I think it is nice to split the
repository based on packages, not based on the information about
packages (like the current separation of primary and file lists
databases). As an example (and the first thing that I want to work on),
consider package requirements. Currently, package requirements are
stored in the primary database, but it seems that you need a package's
requirements only when you want to install that package. By removing the
requires table from Fedora repository's primary database, its size
shrinks from 47MB to 28MB (and in compressed form from 12MB to 6.7MB).
My initial proposal is to store each package's requirements in a
separate signed file (e.g. mypackage-0.0.1.fc10.i386.rpm_requirements).
So, yum will download such files when it needs them. Now, what do you
think about this? Does it worth implementing?
To go farther in splitting, it might be nice to store package
descriptions in separate files too. Also, I thought a little about
splitting package provides too. It should be done based on the provides
themselves, but creating a separate file for each provides might be
overkill. But it might be nice to split the provides based on some
initial characters of their hash code (e.g. based on the first 2
characters of their hash code) into separate small databases.
The file lists could be also split, using the same method as
requirements or provides (maybe even both!), based on their most
important use case (I'm not sure of).
There could be a compatibility period in which both current style and
new style (after implementing all desired functionality) repository
metadata are created; which will have a very small space overhead for
mirrors.
I'd like to hear from you about your opinions.
Thanks a lot,
Hedayat
More information about the Yum-devel
mailing list