[Yum-devel] YumRepository._loadRepoXML proposed changes

James Antill james.antill at redhat.com
Thu Dec 20 04:09:18 UTC 2007


 Ok. so Xmas present everyone :). This is a biggish change I've been
working on for a bit. It basically changes _loadRepoXML() to download
the repomd.xml and all of the MD files as an atomic group, and if
anything goes "wrong" we revert to the old set of MD.
 I looked at another approach, of only trying to revert to old data if
any of getPrimaryXML() etc. had a problem ... but that was much more
complicated, and required we store at least one old copy of everything
and doesn't solve some of the network issues this way does.
 I've been running it for a bit, and I've tried to test some of the
failure cases to make sure it actually does what it says on the box.
Anyway, please take a look and let me know what you think you have a bit
as I'm not planning on checking it in this year (but feel free to test
it too :).

 The change is big, so you can view it here:

http://people.redhat.com/jantill/yum/_groupLoadRepoXML.patch
http://people.redhat.com/jantill/yum/yumRepo.py

 Pros.
 -----

 Should stop these metadata update problems:

  1. We get corrupted comps/etc. files on the master and everyone has
problems.
  2. We hit mirror(s) that have an updated repomd.xml but nothing else.
  3. We don't have a network but the cache does a timeout and urlgrabber
kills repomd.xml and we can't get a new one (makes yum stop working).

 Should stop "back in time updates". Ie. we hit an old mirror, and we
basically go back in time for updates.

 Should stop yum cmd line usage hitting network. Basically yum-updatesd
will now download all of filelists/updateinfo/etc. so we don't have a
problem where user does "yum blah" which happens to need a file we don't
have so we hit the network. This actually has follow-on problems where
the file isn't the same anymore but we haven't updated repomd.xml yet
(or the network is down, think yum deplist /usr/bin/foo).

 Cons.
 -----

 Downloads more stuff at once. Basically the current yum model only ever
downloads what we need, when we need it, now we'll download all the MD
files whenever repomd.xml gets updated and they need it. However I've
left in the functions to do the old behaviour, so we could have a
configuration option or let yum-updatesd only have the new behaviour or
something ... if people think this is a big concern.

 Currently I don't do anything special on C-c, or other weird
exceptions ... so it's possible that a C-c at the wrong time will leave
the repo in a partially updated state. But that's what it does all the
time now, so I don't think this is a big issue.

 _If_ we don't have a full set of MD currently, and we fail to get a new
batch of data then we'll revert back to the non-full set. And I guess
it's then possible that we'll need one of the files in the set we don't
have, but that would be available if we had used the traditional code
path (Ie. the error was in one of the files we don't currently need).
 But after this code has run once successfully we'll always have a full
set of MD, so the only thing that _might_ make this worth considering is
if we allow "traditional" behaviour as an option. I put this here mainly
for full disclosure :).

 Minor CPU speed hit, due to parsing two sets of XML.




 Things to look at / think about
 -------------------------------

    Do you hate the idea/design in some way, is there an alternate
approach you think would be better?

    Does the code in _groupLoadDataMD() look correct?
    I've tested it ... but I'd still like a second opinion.

    Atm. we check for "newness" in _groupCheckDataMDNewer() by looking
at the timestamp information for all the MD files, and if the "new"
repomd.xml has timestamps which is older we dump it. Can anyone think of
any problems with doing this and/or should we just put something in
repomd.xml itself?

    These funcs got args to specify "don't throw":
        YumRepository._checksum
        YumRepository._checkMD
        YumRepository._retrieveMD

...the later two are the public API functions, and I didn't want to add
the new arguments as part of the public API so the public functions call
the internal ones and the internal ones have the new args.
 My big worry here is that I haven't seen any other yum functions that
do this, but the other option is to put try/catch blocks in a bunch of
places, which seems like more code for no gain and makes it less obvious
what is happening.
 In fact it even removed a few lines of code from other functions in
that file that were calling the above within try blocks for the same
reason.

    Do we want to add some kind of configuration or something for the
old behaviour? My opinion is probably not, but then I'd have probably
done it a bit like this to start with ... so what was the rationale?

    new helper functions:
        YumRepository._cachingRepoXML
        YumRepository._getFileRepoXML
        YumRepository._parseRepoXML
        YumRepository._saveOldRepoXML
        YumRepository._revertOldRepoXML
        YumRepository._doneOldRepoXML
        YumRepository._get_mdtype_data
        YumRepository._get_mdtype_fname
        YumRepository._groupCheckDataMDNewer
        YumRepository._groupLoadDataMD

...I assume most of these are fine, assuming the above is good. I've
also added a YumRepository._oldRepoMDFile ... does anyone care?

    I've also moved:

        YumPackageSack._check_db_version

...into YumRepository, I assume that's not controversial? ... but atm.
I've kept the call in YumPackageSack as a wrapper, just in case.

-- 
James Antill <james.antill at redhat.com>
Red Hat
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.baseurl.org/pipermail/yum-devel/attachments/20071219/073298e7/attachment.pgp 


More information about the Yum-devel mailing list