[Rpm-metadata] discussion of createrepo and repodata format future

Michael Schroeder mls at suse.de
Fri Aug 6 14:57:08 UTC 2010


On Fri, Aug 06, 2010 at 09:40:54AM -0400, seth vidal wrote:
> > Well, that depends on the usage pattern. As zypp doesn't support
> > to download the filelist, we make sure that there are no (Build)Requires
> > to files outside of the "primary" filepatterns. As /bin and /usr/bin
> > are used in many packages, we need to download the file information
> > for those directories anyway, so moving them from primary into seperate
> > files makes things worse for us.
> 
> 
> B/c of the extra file to download?

Yes, though with keep-alive connection it probably doesn't matter
much. Also, compression might not work as good if the data is split
into multiple files and each file is compressed seperately.

> B/c the size shouldn't change. With
> some of the suggestions in changing of primary, I'd think the size
> should decrease overall.
> 
> Did you have any thoughts on the suggestion of breaking summary and
> description out into translatable files?

Not really. Do you mean somethink like this (xml version):

primary.de.xml.gz:
  ...
  <package pkgid="xxx" >
    <summary lang="de">Coole Applikation</summary>
    <description lang="de">Macht was tolles...</description>
  </package>

The sqlite version could be similar.

> [...]
> Nitpick but: if I never see 'flavor' as a descriptor of something that
> is NOT food, that'll be fine. It's like file 'colors'. That just makes
> me want to scream whenever I read it.

Heh ;-)

> > Speaking of that lzma patch, I pretty much opposed it because it
> > conflicts with the "delta download" mechanism I implemented some weeks
> > ago. The idea is to use 'gzip --rsyncable' for gz compression, add 'zsync'
> > checksum data to the metalink files and let libzypp download just the
> > changed blocks with range requests. Works quite nice for our maintenance
> > updates, it's proably not very useful for Factory (i.e. "rawhide") where
> > the number of rebuilds is quite high.
> 
> Where is this patch?

See my commits in http://gitorious.org/opensuse/libzypp/commits/master,
especially commit c3ba229.

Zsync works by searching local files for blocks with the same checksum
as the target file. As checksum calculation is not a cheap operation,
you can't simply do it for every byte offset in the local files. Thus
you also need a cheap checksum, and you only verify with the real
checksum if the cheap checksum matches.

Metalinks already comes with support for multiple checksums, so it
was straightforward to add zsync support. For example, the metalink
for the current 11.3 primary.xml.gz looks like:

    ...
    <hash type="sha256">a712c132725a5a58db3aba53c7dba2cfe61789d8c0deda3591aa0aaa2b55a48a</hash>
    <pieces length="131072" type="zsync">
	<hash piece="0">324641aa</hash>
	<hash piece="1">3b5d83ae</hash>
    </pieces>
    <pieces length="131072" type="sha1">
	<hash piece="0">7f2f37fa19dbf953f4ba4c5f19b62fc5053a42d6</hash>
	<hash piece="1">451014563b43ffa75a7a59522beed72a4692b25c</hash>
    </pieces>
    ...

So when libzypp wants to download the new primary file and there
is already an old version it first looks if it finds blocks matching
the checksums in the old file and downloads only the blocks that
couldn't be found. (The code also downloads in parallel from multiple
mirrors, but that's more like a wanted side effect ;-))

This scheme probably only works with xml (where new packages just
get added to the end of the file, at least for our updates) and
with a --rsyncable compression method.

(When you did a fresh installation you would suffer from the
not-optimal compression, so it might make sense to offer *both*
primary.xml.gz (or primary.xml) and primary.xml.lzma. Fresh
installations would use the lzma compressed version and
systems that have an old primary version would use the .gz
variant that supports delta downloads. Actually the library
could first check how many blocks match and then use the
optimal method.)

Cheers,
  Michael.

-- 
Michael Schroeder                                   mls at suse.de
SUSE LINUX Products GmbH, GF Markus Rex, HRB 16746 AG Nuernberg
main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);}


More information about the Rpm-metadata mailing list