[Yum-devel] Requesting feedback on some plans for yum

Hedayat Vatankhah hedayat at grad.com
Sat Jun 5 20:29:17 UTC 2010



/*Seth Vidal <skvidal at fedoraproject.org>*/ wrote on 06/04/2010 7:11:26 
PM +0450:
>
>
> On Fri, 4 Jun 2010, Hedayat Vatankhah wrote:
>
>> Hi all,
>> As you might have noticed, a while ago I wrote a blog post[1] about 
>> some of my ideas for yum's metadata (which I'd like to start working 
>> as soon as I find a little free time which would be hopefully in 1 or 
>> 2 months). I've talked about some of them very briefly in this list, 
>> but its briefness lead to many problems. So, I decided to write more 
>> complete proposals, and the blog post is its results. I've also 
>> considered all the feedback I received in the mailing list, and I 
>> think this post has addressed the problems. However, there are 
>> probably other design problems too. So, I would appreciate if you can 
>> take some time and read the post, and let me know your suggestions. I 
>> like to know any problems in the plan and possible solutions, and 
>> also enhancements.
>> The blog post will have 1 or 2 followups in which I'll describe plans 
>> for other parts of metadata.
>>
>
> Hedayat,
>  let me first say - thanks for wanting to look into this!
>
> A few thoughts for figuring out how best to redesign the repodata:
>
> 1. start with a blank yum cache and go through a normal set of things 
> you (or any user) might do with yum and look at where yum fetches 
> metadata, what metadata it is fetching and WHY it is fetching it. That 
> will help determine the things that are actually needed.
Almost done. But I'll have a more focused look as you said.

>
>
> 2. Look at what things the repodata is lacking - what info that would 
> be valuable if it were possible to provide it. That will help 
> determine the places you will want to separate out.
hmmm... it needs thinking about what information will help users. An 
example would be some metadata about the package type (GUI application, 
console application, etc for normal users, -devel packages for 
developers, other use cases). For example, IMHO people rarely look for 
library packages (excluding -devel packages): they usually need 
applications, or (e.g. developers) are interested in library development 
packages.
This is an important area for thinking by itself.

>
>
> 3. don't optimize too soon.
>
> 4. don't get too focused on standards - fix YOUR problem first, don't 
> try to be all things to all people.
:) OK!

>
>
> Now - with those out of the way let me suggest a few specific tasks:
>
> 1. change the filelist metadata - break the dirs up by paths so if we 
> know the file is in /usr/lib - we don't have to download all of 
> /usr/share to get it. You can do some good statistical analysis of all 
> the files in fedora rawhide, for example, and figure out the best way 
> to break up the filelists into smaller chunks
Yes it certainly needs some statistic analysis. I was thinking that the 
first letters (e.g. first two letters) of the hash of each file path 
might bring better results and distribute the files better among 
different chunks.

>
> 2. provide a way to get all the repodata per-pkg in a discrete blob. 
> so that if I just need the filelist or changelog for pkgfoo - I don't 
> have to download ALL of the filelists and changelogs to get it.
That's already covered in the blog post.

>
> 3. Translations of certain metadata are going to be more important - 
> the package summary and description are the ones that matter most - 
> keep this in mind for formatting things.
As mentioned in the blog, I was intended to put the summary and 
descriptions for each locale in a separate directory.

>
> 4. Remember - anything you work on will need to be generated in a 
> REASONABLE amount of time on the server/repo side - making the rel-eng 
> folks cry is not nice.
Good point to consider. It was not on my design criteria list :P

>
> And some more specific ideas:
>
> repodata/
>         repomd.xml <-- same as before - the index for everything else
>         packagelist.sqlite <-- nevra + summary + description + checksums
>                                + locations + basic pkg info +
>                                location of per-pkg complete metadata
>         provides.sqlite <-- provides
>         requires.sqlite <-- requires
>         conflicts_obsoletes.sqlite <-- conflicts and obsoletes
>         filelists.xml <-- index file to point to the files-by-path
This is more like what I'm thinking about (this is only the server side 
layout, as mentioned in the blog post):
    repodata/
         repomd.xml
         packagelist.sqlite <-- nevra + required checksums + some other 
really needed data
         info/ <-- package summaries and descriptions
             en/
                 package1_full_name.info <-- per package or per n(small 
number) package info
                 ... <other packages>
             fr/
             ... <other locales>
         provides.sqlite <-- provides. (might be split into smaller 
parts like file lists if can grow)
         requires/ <-- (might also contain conflicts/obsoletes if they 
are usually required at the same time)
             package_1_full_name.requires
             ...
         conflicts_obsoletes/ <-- if not merged in the requirements files
             package_1_full_name.confobs
             ...
         filelists.xml <-- index file to point to the files-by-path
         filelists/
                 ?! <-- depends on the way of splitting file lists, TBD


Thanks for your attention,
Hedayat

>
>
> -sv
>
> _______________________________________________
> Yum-devel mailing list
> Yum-devel at lists.baseurl.org
> http://lists.baseurl.org/mailman/listinfo/yum-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.baseurl.org/pipermail/yum-devel/attachments/20100606/218141a7/attachment.html>


More information about the Yum-devel mailing list