[Yum-devel] Implementing delta metadata

Tirtha Chatterjee tirtha.p.chatterjee at gmail.com
Mon Sep 26 11:31:24 UTC 2011


On Mon, Sep 26, 2011 at 4:51 PM, Zdenek Pavlas <zpavlas at redhat.com> wrote:
> Hi!
>
> Thanks for the interest in improving MD download!
> Just a few thoughts (I don't consider myself experienced
> in the codebase, esp on the createrepo part).
>
> - Sharding metadata is very likely not an option.
>
> The per-file overhead (to download, to store, to query)
> is significant, and to get a significant fraction of files
> not modified, we'd need quite a lot of them (100+ I guess).
>
> - Rsync-friendly metadata are IMO better option, but..
>
> 1) pkgKey values are assigned sequential, so adding/removing
> a package in the middle touches 50% of metadata.
>
> 2) It's very likely (although I'm not sure) that building
> sqlite DB from scratch from two slightly different inputs
> produces two very different databases that rsync poorly.
> (due to records ending up in different page offsets).
>
> So, keeping persistent pkgKeys (1), and building
> new metadata database by copying the old one and performing
> a set of insert/delete/updates (2) would help a lot.

Yes, keeping persistent pkgKeys won't be hard IMO, would require a
little hacking in createrepo --update option.

>
> Then there's another issue.. compressed sqlite files
> are currently primary means of metadata distribution,
> but that's likely to change.

I conducted a few tests, and found that the diff size is much less in
general when i perform the diff between xml files, than in case of
sqlite files.

Also, what is the advantage of using sqlite instead of xml (since I
could not find this anywhere in the wiki)?

>
> On yum side, there are other problems:
>
> 3) non-existent rsync:// support in libcurl and urlgrabber.
>

I have used the bsdiff utility to create binary diffs, and it is
giving nice small deltas. bspatch can be used to patch it back. So
rsync won't be really necessary.

> Yum would probably have to exec() rsync, and that integrates
> badly (no mirror failovers, different progress meters etc).
>
> --
> Zdenek
> _______________________________________________
> Yum-devel mailing list
> Yum-devel at lists.baseurl.org
> http://lists.baseurl.org/mailman/listinfo/yum-devel
>


What do you think?

-- 
Regards
Tirtha Chatterjee
KDE developer
http://wyuka.co.cc/


More information about the Yum-devel mailing list