[Yum-devel] Requesting feedback on some plans for yum

Mon Jun 7 19:48:21 UTC 2010


/*Seth Vidal <skvidal at fedoraproject.org>*/ wrote on 06/07/2010 7:38:27 
PM +0450:
>
>
> On Sun, 6 Jun 2010, Hedayat Vatankhah wrote:
>>
>> hmmm... it needs thinking about what information will help users. An 
>> example would be some metadata about
>> the package type (GUI application, console application, etc for 
>> normal users, -devel packages for
>> developers, other use cases). For example, IMHO people rarely look 
>> for library packages (excluding -devel
>> packages): they usually need applications, or (e.g. developers) are 
>> interested in library development
>> packages.
>> This is an important area for thinking by itself.
>
> I would encourage you to not just think about users when you think 
> about the metadata - a lot of applications use this metadata so 
> working on the basis that only users will need to know about certain 
> things is going to be overly limiting.
An interesting point. So we might be able to provide many semantic data 
to be used by other tools. Is there any list of suggestions/ideas about 
such meta data?


>
>>
>>       Now - with those out of the way let me suggest a few specific 
>> tasks:
>>
>>       1. change the filelist metadata - break the dirs up by paths so 
>> if we know the file is in
>>       /usr/lib - we don't have to download all of /usr/share to get 
>> it. You can do some good
>>       statistical analysis of all the files in fedora rawhide, for 
>> example, and figure out the best
>>       way to break up the filelists into smaller chunks
>>
>> Yes it certainly needs some statistic analysis. I was thinking that 
>> the first letters (e.g. first two
>> letters) of the hash of each file path might bring better results and 
>> distribute the files better among
>> different chunks.
>
> you'll end up with a massive amount of files in /usr since that is 
> where, I'd bet, 90% of the files are.
I think I didn't explained my idea clearly. I'm suggesting a hash table 
for storing file lists. This way, files will be better distributed among 
hash table cells. For example, we might use the first two letters of the 
hash code of file names as the index of our hash table. So, two files in 
/usr directory will likely end up in different cells of the table:

/usr/bin/vim   ----MD5 SUM---> 7f351f6e834a9a0230e050d765f238c1    
---file list chunk--->   7f.filelist.db
/usr/bin/gedit  ------------------> a4407c2d7bb56df94bcc2b7294de0dfb   
------------------------->   a4.filelist.db


>
>> This is more like what I'm thinking about (this is only the server 
>> side layout, as mentioned in the blog
>> post):
>>    repodata/
>>         repomd.xml
>>         packagelist.sqlite <-- nevra + required checksums + some 
>> other really needed data
>>         info/ <-- package summaries and descriptions
>
>               you have to provide some sort of index file for these so 
> we can provide a checksum for that index file in repomd.xml. That way 
> we can verify and rely on the results
Isn't it enough for repomd.xml to store the checksome of 
packagelist.sqlite? Other files can be verified by their signature. Or 
do you use checksums instead of signatures? Or you use both? (Sorry I'm 
apparently not much familiar with security mechanisms used in yum, and 
if there is any documentation about it I would be glad to know about).

>
>>         provides.sqlite <-- provides. (might be split into smaller 
>> parts like file lists if can grow)
>
> provides aren't that big, really.
> For all of rawhide, uncompressed, they are 2.8M, total. 105296 entries 
> for 17073 pkgs.
Thanks for the real data.

>
>>         requires/ <-- (might also contain conflicts/obsoletes if they 
>> are usually required at the same
>> time)
>>             package_1_full_name.requires
>>             ...
>>         conflicts_obsoletes/ <-- if not merged in the requirements files
>>             package_1_full_name.confobs
>
> Something MAYBE worth doing is this - for each pkg - in 
> packagelist.sqlite - only mention if they have any obsoletes/conflicts 
> - that way we can do a shorthand lookup to see if we even need to 
> bother fetching those other files.
Yes. Maybe I should create a wiki-like page somewhere to put all 
ideas/suggestions/corrections there. It would be also usable to gather 
ideas about other useful metadata which can be provided (as mentioned at 
the top).

>
>              ... >         filelists.xml <-- index file to point to 
> the files-by-path
>>         filelists/
>>                 ?! <-- depends on the way of splitting file lists, TBD
>
> I'm dumping out a list of all files in rawhide and I'll see if I can 
> generate some statistics by dir and post them.
Thanks. I'm also interested in generating the hash of all files and 
doing some statistic analysis there too.

Thanks again,
Hedayat

>
>
> -sv
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.baseurl.org/pipermail/yum-devel/attachments/20100608/429af0a0/attachment.html>