[Yum-devel] [PATCH] DMD: use pkgId to join filelists_db & primary_db.

James Antill james at fedoraproject.org
Thu Nov 8 16:39:24 UTC 2012


On Thu, 2012-11-08 at 12:31 +0100, Zdeněk Pavlas wrote:
> Yum relies too much on createrepo inner workings, assumes that pkgKeys
> in filelists_db and primary_db are equal.  This only holds if databases
> are always created from scratch and if <package> tags in filelists.xml
> follow primary.xml order.  This has never been guaranteed

 That's not true, AIUI. pkgKey is generated by the order things are
found for primary, that is true ... but filelists/other both lookup
pkgKey based on pkgId, which is why you have to generate primary before
filelists/other in createrepo.

> , also with
> delta-metadata and local updates, consider the following:
> 
> 1) Yum downloads primary_db.sqlite with packages A, B.
>    pkgKeys: A=1, B=2
> 2) repository changes: pkg A is removed, pkg C is added.
> 3) Yum downloads primary_delta.xml, updates primary_db.sqlite.
>    pkgKeys: B=2, C=3 after the update
> 4) Yum needs filelists, downloads filelists_db.sqlite.
>    pkgKeys: B=1, C=2

 Any form of delta metadata that doesn't produce a byte for byte
compatible version of _something_ from upstream is going to require a
huge amount of verification work.

 Atm. we basically have:

repomd =>
  primary.gz =>
    primary.sqlite
  primary.sqlite
  primary.sqlite.bz2 =>
    primary.sqlite
  filelists
  [...]

...and we can thus. (from our repomd) verify that we have the same thing
downloaded that everyone else sees in a single step, and are using the
same thing in one or two steps.
 Anything that means we just alter our local .sqlite file means we have:

repomd-T1 =>
  primary.sqlite-T1
repomd-T2 =>
  primary-delta-T2
repomd-T3 =>
  primary-delta-T3
repomd-T4 =>
  primary-delta-T4
[...]
repomd-Tn =>
  primary-delta-Tn

...and we'll have to keep and follow the entire chain (and we can't
actually verify any of the repomd's that aren't current).
 We had the same problem when we used to download just the new
primary.xml files and update our local .sqlite files ... and we just
assumed it'd be fine (we didn't do verification) ... and it mostly
worked, except when it didn't. We eventually fixed these problems just
no updating.

 It also makes delta's _much_ safer if we can just test "did the
repodata that came out match what we would have downloaded". 
 I think that we order all the data in the xml now, so if we test the
open-checksum there vs. a newly generated .xml in the correct order then
those checksums should always match. It'll require more work if we can't
make the .gz version match, but it should still be doable.

> Performance:
> 
> There's an extra JOIN needed to get pkgId, should be fairly
> cheap.  Translating pkgId to pkgKey runs for free.

 Putting some numbers there:

time sqlite3 \
26eac5d1b62aac96855bfe6953f3f244d4f7de12ccc7afe4b30af2a5003973fd-filelists.sqlite
'SELECT * FROM filelist WHERE pkgKey=19133'
0.003

time sqlite3 \
26eac5d1b62aac96855bfe6953f3f244d4f7de12ccc7afe4b30af2a5003973fd-filelists.sqlite 'SELECT * FROM filelist JOIN packages USING(pkgKey) WHERE pkgId="af3720e24e9a509ee263916b7061387c8bb16b8679bd848ddcd2199fd2a4d030"'
0.004

...but that is per. row requested. A quick test program that requests
every row gives:

pkgKey ~= 1.45
pkgId  ~= 2.06

...now I'm not sure how many more requests we'll generate for different
usages, so it's not completely obvious where between 0.001 and 0.6 the
performance hit will be.



More information about the Yum-devel mailing list