[Rpm-metadata] discussion of createrepo and repodata format future

seth vidal skvidal at fedoraproject.org
Fri Aug 6 17:02:47 UTC 2010


On Fri, 2010-08-06 at 16:57 +0200, Michael Schroeder wrote:

> Yes, though with keep-alive connection it probably doesn't matter
> much. Also, compression might not work as good if the data is split
> into multiple files and each file is compressed seperately.

I think the compression won't be impacted much - if only b/c splitting
the files out by subdir mean we can drop out the first section entirely:

if we know the file represents /usr/bin

well then we don't need to include that in any of the files there.


> > B/c the size shouldn't change. With
> > some of the suggestions in changing of primary, I'd think the size
> > should decrease overall.
> > 
> > Did you have any thoughts on the suggestion of breaking summary and
> > description out into translatable files?
> 
> Not really. Do you mean somethink like this (xml version):
> 
> primary.de.xml.gz:
>   ...
>   <package pkgid="xxx" >
>     <summary lang="de">Coole Applikation</summary>
>     <description lang="de">Macht was tolles...</description>
>   </package>
> 
> The sqlite version could be similar.

More or less -but again - using the name/location/label of the file to
determine language - so you don't have a bunch of duplicate info.

if the file is named trans.de_DE.xml.gz then we don't really need to
label summary and description with lang='de' do we?


> > > Speaking of that lzma patch, I pretty much opposed it because it
> > > conflicts with the "delta download" mechanism I implemented some weeks
> > > ago. The idea is to use 'gzip --rsyncable' for gz compression, add 'zsync'
> > > checksum data to the metalink files and let libzypp download just the
> > > changed blocks with range requests. Works quite nice for our maintenance
> > > updates, it's proably not very useful for Factory (i.e. "rawhide") where
> > > the number of rebuilds is quite high.
> > 
> > Where is this patch?
> 
> See my commits in http://gitorious.org/opensuse/libzypp/commits/master,
> especially commit c3ba229.
> 
> Zsync works by searching local files for blocks with the same checksum
> as the target file. As checksum calculation is not a cheap operation,
> you can't simply do it for every byte offset in the local files. Thus
> you also need a cheap checksum, and you only verify with the real
> checksum if the cheap checksum matches.

Ah - now I recall - the need for the hashing and keeping around older
revisions of the metadata is what made zsync less palatable.

Since each pkg is an individual 'chunk' of data that is needed to
comprise the whole of the repodata, one thought was generating both a
complete copy of the repodata and a discrete chunk of the metadata
per-pkg.

So if I look up the pkglist and see that the changeset from the last
time I got the metadata was the addition/update of 100 pkgs and the
removal of 20 and downloading the metadata for those 100 pkgs is smaller
than downloading the whole thing, then I could just do that and create
the new metadata on my own.

It's a slightly more coarse-grained delta'ing of the metadata but in
discrete chunks that make sense to the user, not just to the parser.


> This scheme probably only works with xml (where new packages just
> get added to the end of the file, at least for our updates) and
> with a --rsyncable compression method.

you add the content to the end of the existing metadata? Does that mean
your metadata grows w/o bound over the duration of a release?


> (When you did a fresh installation you would suffer from the
> not-optimal compression, so it might make sense to offer *both*
> primary.xml.gz (or primary.xml) and primary.xml.lzma. Fresh
> installations would use the lzma compressed version and
> systems that have an old primary version would use the .gz
> variant that supports delta downloads. Actually the library
> could first check how many blocks match and then use the
> optimal method.)

That's what I was suggesting  by having per-pkg chunks available AS well
as complete sets.

-sv




More information about the Rpm-metadata mailing list