[Yum-devel] Requesting feedback on some plans for yum

Hedayat Vatankhah hedayat at grad.com
Tue Jun 8 20:55:11 UTC 2010



/*Seth Vidal <skvidal at fedoraproject.org>*/ wrote on 06/08/2010 6:16:02 
PM +0450:
>
>
> On Tue, 8 Jun 2010, Hedayat Vatankhah wrote:
>
>
>> Certainly. But I thought that if a single directory contains a lot of 
>> files, then the corresponding chunk
>> will become large by its own. My idea would create a number of small 
>> files, and so any updates to the
>> metadata MIGHT change only a few small files. But another solution 
>> might be useful too: if a single
>> directory (e.g. /usr/bin) contains a large number of files, we might 
>> split the file lists to chunks: e.g. a
>> chunk for /usr/bin/a2p and another chunk for /usr/bin/c++ , so that 
>> the first chunk contains files in
>> /usr/bin whose names are between a2p and c++, and the latter has 
>> filename c++ and later. This approach
>> could be used too.
>
>
>
> But they won't be small files and given that the checksums are, in 
> fact, evenly distributed you'll end up needing N of those files.
>
> So let's say we split them the way you describe.
>
> rawhide's filelists  are 85MB. If we split them into 256 subsets (A-F, 
> 0-9) then we'll be breaking up the filelists into 332KB chunks.
>
> Now - given that 98% of all of the file reqs are in /usr/bin or /bin/ 
> we'll be chasing our tail finding all of them in all of these chunks. OR
> we could store them by the dir distribution of the files and download 5
> small chunks of: /bin, /sbin, /usr/bin, /usr/sbin and /etc.
>
> And only take the hit of downloading a bunch of data when someone does 
> something stupid and has a file req on something inside /usr/share.
>
> Do you see what I mean, now?
Yes I do :) I was trying to have less downloads when needing a few file 
dependencies, but in a long run my proposal is inferior. So I agree that 
splitting based on file path makes sense. Splitting the file lists of a 
single directory might make sense for large directories, and it seems 
that in this case something like delta-metadata's make sense to avoid 
downloading duplicate data.

>
>>
>> Yes (I thought that you do gpg sign ALL files, so they can be 
>> verified on their own).
>
> Gpg signing all the files is an expensive operation. It means you have 
> to get all those files to the signing server and sign them then put 
> the signed output back out servers to be mirrored.
>
> If you sign the top level and have valid checksums the rest of the way 
> down you're in good shape and it is MUCH cheaper.
Yes, that's reasonable.

Thanks again,
Hedayat

>
>
> -sv
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.baseurl.org/pipermail/yum-devel/attachments/20100609/d0eb1381/attachment.html>


More information about the Yum-devel mailing list