[Yum-devel] Requesting feedback on some plans for yum
Seth Vidal
skvidal at fedoraproject.org
Mon Jun 7 20:29:12 UTC 2010
On Tue, 8 Jun 2010, Hedayat Vatankhah wrote:
> An interesting point. So we might be able to provide many semantic data to be used by other tools. Is there
> any list of suggestions/ideas about such meta data?
repoquery springs to mind - it is COMMONLY used by folks and while I
certainly would not want to optimize for that use case I also don't want
to cripple it.
> I think I didn't explained my idea clearly. I'm suggesting a hash table for storing file lists. This way,
> files will be better distributed among hash table cells. For example, we might use the first two letters of
> the hash code of file names as the index of our hash table. So, two files in /usr directory will likely end
> up in different cells of the table:
>
> /usr/bin/vim ----MD5 SUM---> 7f351f6e834a9a0230e050d765f238c1 ---file list chunk---> 7f.filelist.db
> /usr/bin/gedit ------------------> a4407c2d7bb56df94bcc2b7294de0dfb ------------------------->
> a4.filelist.db
What would that buy us? When we know the vast majority of our file deps
are in [/usr]/[s]bin/ why would we make someone download, while an even
split of the filelists, mostly likely VASTLY too much data.
If we split the files based on:
/toplevel dir
and
/usr/*/*
so when we go to determine which filelist chunk to grab we just look at
the filename itself and look to see what segment has the highest number of
matching letters.
Remember - our goal is not to evenly distribute the filelists into chunks
- our goal is to optimize for the cases that are actually used.
>
> Isn't it enough for repomd.xml to store the checksome of packagelist.sqlite? Other files can be verified by
> their signature. Or do you use checksums instead of signatures? Or you use both? (Sorry I'm apparently not
> much familiar with security mechanisms used in yum, and if there is any documentation about it I would be
> glad to know about).
for any file you make you need a checksum of that file (sha256 or better)
stored somewhere.
we can gpg sign repomd.xml so that the checksums in that file can be
trusted - but everything else falls out from there.
so if repomd.xml has a checksum of files-by-path.xml
then files-by-path.xml has to have a checksum of each of the databases of
files it knows about.
so we can verify the databases of files by looking at the checksums stored
in files-by-path, we verify files-by-path by looking at the checksums
stored in repomd.xml and we verify repomd.xml by checking the gpg
signature.
understand?
> Yes. Maybe I should create a wiki-like page somewhere to put all ideas/suggestions/corrections there. It
> would be also usable to gather ideas about other useful metadata which can be provided (as mentioned at the
> top).
I've made one here:
http://yum.baseurl.org/wiki/dev/NewRepoDataIdeas
>
> Thanks. I'm also interested in generating the hash of all files and doing some statistic analysis there
> too.
see the wiki page above for the filelists, data, too.
-sv
More information about the Yum-devel
mailing list