[Yum-devel] Requesting feedback on some plans for yum

Seth Vidal skvidal at fedoraproject.org
Mon Jun 7 20:29:12 UTC 2010



On Tue, 8 Jun 2010, Hedayat Vatankhah wrote:

> An interesting point. So we might be able to provide many semantic data to be used by other tools. Is there
> any list of suggestions/ideas about such meta data?

repoquery springs to mind - it is COMMONLY used by folks and while I 
certainly would not want to optimize for that use case I also don't want 
to cripple it.

> I think I didn't explained my idea clearly. I'm suggesting a hash table for storing file lists. This way,
> files will be better distributed among hash table cells. For example, we might use the first two letters of
> the hash code of file names as the index of our hash table. So, two files in /usr directory will likely end
> up in different cells of the table:
> 
> /usr/bin/vim   ----MD5 SUM---> 7f351f6e834a9a0230e050d765f238c1    ---file list chunk--->   7f.filelist.db
> /usr/bin/gedit  ------------------> a4407c2d7bb56df94bcc2b7294de0dfb   ------------------------->  
> a4.filelist.db

What would that buy us? When we know the vast majority of our file deps 
are in [/usr]/[s]bin/ why would we make someone download, while an even 
split of the filelists, mostly likely VASTLY too much data.

If we split the files based on:
/toplevel dir
and
/usr/*/*

so when we go to determine which filelist chunk to grab we just look at 
the filename itself and look to see what segment has the highest number of 
matching letters.

Remember - our goal is not to evenly distribute the filelists into chunks 
- our goal is to optimize for the cases that are actually used.


> 
> Isn't it enough for repomd.xml to store the checksome of packagelist.sqlite? Other files can be verified by
> their signature. Or do you use checksums instead of signatures? Or you use both? (Sorry I'm apparently not
> much familiar with security mechanisms used in yum, and if there is any documentation about it I would be
> glad to know about).

for any file you make you need a checksum of that file (sha256 or better) 
stored somewhere.

we can gpg sign repomd.xml so that the checksums in that file can be 
trusted - but everything else falls out from there.
so if repomd.xml has a checksum of files-by-path.xml
then files-by-path.xml has to have a checksum of each of the databases of 
files it knows about.

so we can verify the databases of files by looking at the checksums stored 
in files-by-path, we verify files-by-path by looking at the checksums 
stored in repomd.xml and we verify repomd.xml by checking the gpg 
signature.

understand?

> Yes. Maybe I should create a wiki-like page somewhere to put all ideas/suggestions/corrections there. It
> would be also usable to gather ideas about other useful metadata which can be provided (as mentioned at the
> top).

I've made one here:
http://yum.baseurl.org/wiki/dev/NewRepoDataIdeas

> 
> Thanks. I'm also interested in generating the hash of all files and doing some statistic analysis there
> too.

see the wiki page above for the filelists, data, too.


-sv


More information about the Yum-devel mailing list