[Yum-devel] [PATCH 1/3] Use delta metadata when available

Zdenek Pavlas zpavlas at redhat.com
Wed Jun 26 08:47:39 UTC 2013


>  This requires a client check in very often, right?
>  We basically have deltas from specific createrepo runs, and we can only
> have N of them? The big problem here is that if we have one package
> added per. createrepo run, it's the same metric as if we have 100
> packages added.

Yes, I assume the periods of client check-ins and createrepo
runs are comparable.  The default delta limit is 5.

>  I don't see how that will work well for rawhide, and I'm far from sure
> it will work well for Fedora updates/updates-testing.
>  There are also likely to be future types of repos. that are getting
> built much more often than rawhide (think coprs automatically building
> from git commits etc.).

It should work well for fedora-updates (the use case I aimed at), but 
I'm not sure, too.  For frequently rebuilt repos somewhat different
delta format is needed.

If old chunks are referenced by a hash (or a parsed nevra) instead 
of the ordinal, one delta file built for updating from snapshot S
can be used to update from any superset of S.

> > - The delta files are much smaller than these produced with 'diff -e'.
> 
>  Even when you compress the 'diff -e' result? How?

diff -e does not handle reordering.  It's better at copying large blocks
(implicit no-op) but \n\n\n\n... is compressed to close to nothing,
and in the end the custom diff was much better.

> 
> > - It handles package reordering very well.  Fedora still uses old
> >   createrepo that shuffles packages a lot when ran with --update.
> 
>  I thought F18 used the F18 yum+createrepo etc. ... so all the ordering
> was fine now, it was only the rpmbuild side that was using el5/el6 aged
> code.

Haven't looked at F18, and there are no fedora-updates in F19 yet..
But in F17 new package is always placed near the end of primary.xml and
the following update movies it to the "proper" place.

> > - It's easy to merge chained diffs, even if the original is not available.
> 
>  Not sure what you mean here.

combinediff does not handle "diff -e" output, that was the showstopper.
And the format is so hairy (dot-quoting, final newlines, etc) that
implementing it would be much more work than using a custom diff format.

> > 2) Yum must use the XML metadata and build sqlite databases locally.
> >    createrepo must use --no-database, or mddownloadpolicy=xml option
> >    has to be set in yum.conf or *.repo file.
> 
>  Sure, that's pretty much what the option was added for anyway and we
> can also change yum so that turning on deltamd implies
> mddownloadpolicy=xml when deltas are available.

Do we need the deltamd option on client at all?

> we have:
> 
> old-MD.xml
> 
> repo has:
> 
> new-MD.xml
> delta-from-old2new-MD.bz2
> 
> ...we don't end up with "new-MD.xml" we end up "local-MD.xml" which is
> assumed to be the same as "new-MD.xml"

I won't use delta-from-old2new when timestamp does not match old-MD.xml.

>  Why can't we check what we generated against the data that is offered
> for full download? This will eliminate all these problems.

The generated file is checksummed later in YumRepo.populate, and if it
does not match, full download is attempted.  The only bad thing is that
it's too late to revert.

It should not hurt to checksum the generated file right away, but
_commonRetrieveDataMD_done() is not supposed to fail.

Too bad that the current API dictates that _retrieveMD() must always
return compressed files, and there's no high-level function that also
wraps decompression.  We could simply plug all the delta stuff there.


More information about the Yum-devel mailing list