[Yum-devel] [PATCH 1/3] Use delta metadata when available

James Antill james at fedoraproject.org
Tue Jun 25 22:43:45 UTC 2013


On Tue, 2013-06-25 at 13:38 +0200, Zdenek Pavlas wrote:
> When Yum needs a newer version of <mdtype> and there's a <mdtype>.delta<N>
> available in new repomd with a timestamp that matches the old <mdtype> version,
> we download it and apply.

 This requires a client check in very often, right?
 We basically have deltas from specific createrepo runs, and we can only
have N of them? The big problem here is that if we have one package
added per. createrepo run, it's the same metric as if we have 100
packages added.
 I don't see how that will work well for rawhide, and I'm far from sure
it will work well for Fedora updates/updates-testing.
 There are also likely to be future types of repos. that are getting
built much more often than rawhide (think coprs automatically building
from git commits etc.).

> The diff/patch algorithm is targeted at XML metadata files.  We split
> at each "<package " substring, and also at the last closing tag.
> A repository with N packages always yields exactly N+2 chunks.
>
> The delta format is a simple line-oriented sequence of <literal> or <chunkref>
> tokens.  Sequential references are further compressed to just a single newline.
> Delta file is finally compressed with a general-purpose compressor.

 Ok, seems sane from a general description.

> - The delta files are much smaller than these produced with 'diff -e'.

 Even when you compress the 'diff -e' result? How?

> - It handles package reordering very well.  Fedora still uses old
>   createrepo that shuffles packages a lot when ran with --update.

 I thought F18 used the F18 yum+createrepo etc. ... so all the ordering
was fine now, it was only the rpmbuild side that was using el5/el6 aged
code.

> - Since the chunks we handle are quite big, it's fast.
> 
> - It's easy to merge chained diffs, even if the original is not available.

 Not sure what you mean here.

> The cons are:
> 
> - We need to (usually) load the whole old file to memory, although an attempt
>   is being made to make the copy streaming if possible.
> 
> - Sub-package changes are not supported.  A simple pkg version + checksum
>   bump is as costly as adding a new package.

 I doubt this one is that problematic. Worst case I assume is resigning,
and even then I figure people can live with having to redownload
everything on a resign.

> To make use of it:
> 
> 1) The metadata must include the deltamd information.  The deltamd script
>    in createrepo facilitates this, including automatic merging of previous
>    deltas and their limiting.

 Sure, I assume the big downside here is that repomd gets bigger ... do
you have data on that?

> 2) Yum must use the XML metadata and build sqlite databases locally.
>    createrepo must use --no-database, or mddownloadpolicy=xml option
>    has to be set in yum.conf or *.repo file.

 Sure, that's pretty much what the option was added for anyway and we
can also change yum so that turning on deltamd implies
mddownloadpolicy=xml when deltas are available.

 However the giant downside I see (I think) is that you aren't
generating valid MD as a result. So given:

we have:

old-MD.xml

repo has:

new-MD.xml
delta-from-old2new-MD.bz2

...we don't end up with "new-MD.xml" we end up "local-MD.xml" which is
assumed to be the same as "new-MD.xml" ... as I said before doing that
can't end well, even in the best case every time any bug report comes in
that we can't immediately reproduce we'll have to wonder "do they
actually have the same MD". It almost guarantees weird problems that
only happen after a client has downloaded/applied 666 deltas.
 AIUI it's also the reason for the following two patches, to try and
work around the fact the rest of yum doesn't like the fact we don't
actually have anything downloaded from the repo. anymore.

 Why can't we check what we generated against the data that is offered
for full download? This will eliminate all these problems.



More information about the Yum-devel mailing list