[Yum-devel] Breaking yum repository metadata

James Antill james at fedoraproject.org
Tue Oct 20 14:50:13 UTC 2009


On Tue, 2009-10-20 at 13:01 +0330, Hedayat Vatankhah wrote:
> Hi all,
> 
> I'd like to create a prototype of a new repository metadata, and it 
> would be nice if you let me know about any negative points you see in 
> the proposal (like the previous one with security concerns):
> 
> In the current implementation, the repository's primary database 
> contains a considerable amount of information about each package, and 
> most of such information won't be used by many users. It wastes 
> bandwidth, which gets worse as the size of the repository grows. For 
> example, the current Fedora repository primary database is about 12MB in 
> compressed form and 47MB in normal form. There are still many users for 
> which downloading 12MB of data is not fast, and as currently yum doesn't 
> resume downloading metadata files, it could be really frustrating for 
> users with poor internet connection.

 This is misleading, updates (the only part of Fedora's release repos.
that change) is currently (for F11) 22MB uncompressed and 5.5MB
compressed. This is still not "tiny" but it's much smaller than updating
everything.

> IMHO, it would be nice if users download only what they really need, not 
> the complete repository data. So, I think it is nice to split the 
> repository based on packages, not based on the information about 
> packages (like the current separation of primary and file lists 
> databases). As an example (and the first thing that I want to work on), 
> consider package requirements. Currently, package requirements are 
> stored in the primary database, but it seems that you need a package's 
> requirements only when you want to install that package. By removing the 
> requires table from Fedora repository's primary database, its size 
> shrinks from 47MB to 28MB (and in compressed form from 12MB to 6.7MB). 
> My initial proposal is to store each package's requirements in a 
> separate signed file (e.g. mypackage-0.0.1.fc10.i386.rpm_requirements). 
> So, yum will download such files when it needs them.  Now, what do you 
> think about this? Does it worth implementing?

 The problem here is we need the requirements lookups to be fast, and
being in a single .sqlite DB is going to be much faster than having
N .xml files.
 Also things like "repoquery --whatrequires" will now be horrible.

 Saying that my suspicion is that requirements don't change that much,
so if we could split them cleverly it's possible we could reuse them a
lot.

 Feel free to investigate, I just don't think we can promise to accept
anything.

> To go farther in splitting, it might be nice to store package 
> descriptions in separate files too.

 One of the things that's on the TODO list is to remove summary and
description from primary, and have them in locale specific files. This
should solve a number of problems, and we'd be more than happy to have
some extra hands to make this happen sooner.

>  Also, I thought a little about 
> splitting package provides too. It should be done based on the provides 
> themselves, but creating a separate file for each provides might be 
> overkill. But it might be nice to split the provides based on some 
> initial characters of their hash code (e.g. based on the first 2 
> characters of their hash code) into separate small databases.

 I doubt this would be a win.

> The file lists could be also split, using the same method as 
> requirements or provides (maybe even both!), based on their most 
> important use case (I'm not sure of).

 What we'd really like to do, long term., is remove file requirements
completely. But that requires a lot of work, mostly non-technical.

-- 
James Antill <james at fedoraproject.org>
Fedora


More information about the Yum-devel mailing list