[Yum-devel] Speed hit of having 10 versions of all 10,000 packages

James Antill james.antill at redhat.com
Tue May 20 19:34:13 UTC 2008


 For some obvious reasons, I decided to look at what kind of speed hit
yum takes (and where) when we start dealing with a lot of repos. which
have different versions of the same pkg.

 So I did the simple expedient of creating a set of 9 extra "Fedora"
repos. with different names, this will not be accurate if we have lots
of actual different packages, but it's mostly fair for the "upgrade"
case where you enable a bunch of different "newer package" repos.
(although we'd need to tweak the upgrade code not to require all the
older versions of pkgs, see below).

 Eg. "HEAD repolist" gives:

and.org-OS-updates   And.org OS updates                       enabled :       0
and.org-james        And.org                                  enabled :      38
and.org-misc         And.org misc                             enabled :       4
brew                 Brew Buildsystem for Fedora Core 8 - x86 enabled :       6
fedora               Fedora 8 - x86_64                        enabled :  10,657
fedora-debuginfo     Fedora 8 - x86_64 - Debug                enabled :   3,206
fedora-source        Fedora 8 - Source                        enabled :   4,834
rhts                 Red Hat Test Suite - x86_64 - Base       enabled :      93
tst-F1               Fedora [TST  1] 8 - x86_64               enabled :  10,657
tst-F1-debuginfo     Fedora [TST  1] 8 - x86_64 - Debug       enabled :   3,206
tst-F1-source        Fedora [TST  1] 8 - Source               enabled :   4,834
tst-F2               Fedora [TST  2] 8 - x86_64               enabled :  10,657
tst-F2-debuginfo     Fedora [TST  2] 8 - x86_64 - Debug       enabled :   3,206
tst-F2-source        Fedora [TST  2] 8 - Source               enabled :   4,834
tst-F3               Fedora [TST  3] 8 - x86_64               enabled :  10,657
tst-F3-debuginfo     Fedora [TST  3] 8 - x86_64 - Debug       enabled :   3,206
tst-F3-source        Fedora [TST  3] 8 - Source               enabled :   4,834
tst-F4               Fedora [TST  4] 8 - x86_64               enabled :  10,657
tst-F4-debuginfo     Fedora [TST  4] 8 - x86_64 - Debug       enabled :   3,206
tst-F4-source        Fedora [TST  4] 8 - Source               enabled :   4,834
tst-F5               Fedora [TST  5] 8 - x86_64               enabled :  10,657
tst-F5-debuginfo     Fedora [TST  5] 8 - x86_64 - Debug       enabled :   3,206
tst-F5-source        Fedora [TST  5] 8 - Source               enabled :   4,834
tst-F6               Fedora [TST  6] 8 - x86_64               enabled :  10,657
tst-F6-debuginfo     Fedora [TST  6] 8 - x86_64 - Debug       enabled :   3,206
tst-F6-source        Fedora [TST  6] 8 - Source               enabled :   4,834
tst-F7               Fedora [TST  7] 8 - x86_64               enabled :  10,657
tst-F7-debuginfo     Fedora [TST  7] 8 - x86_64 - Debug       enabled :   3,206
tst-F7-source        Fedora [TST  7] 8 - Source               enabled :   4,834
tst-F8               Fedora [TST  8] 8 - x86_64               enabled :  10,657
tst-F8-debuginfo     Fedora [TST  8] 8 - x86_64 - Debug       enabled :   3,206
tst-F8-source        Fedora [TST  8] 8 - Source               enabled :   4,834
tst-F9               Fedora [TST  9] 8 - x86_64               enabled :  10,657
tst-F9-debuginfo     Fedora [TST  9] 8 - x86_64 - Debug       enabled :   3,206
tst-F9-source        Fedora [TST  9] 8 - Source               enabled :   4,834
updates              Fedora 8 - x86_64 - Updates              enabled :   5,286
updates-source       Fedora 8 - Updates Source                enabled :   1,974
updates-testing      Fedora 8 - x86_64 - Test Updates         enabled :     771
repolist: 195,142

...this gives roughly the following for "echo n | upgrade" (making sure
the network wasn't involved):

name                                      beg      end     ~diff
------------------------------------------------------------------
old                                       0       36.086   36
    RepoStorage.populateSack              0.5855   2.5386   2
    YumBaseCli.updatePkgs                 4.5586  31.4100  27
        MetaSack.simplePkgList            5.1710  18.1303  13
        Updates.__init__                 18.1303  20.5608   2.4
        MetaSack.returnObsoletes         20.5612  29.3818   8.8
            MetaSack.returnNewestByName  21.5447  29.2674   7.7
    depsolver-ish[1]                     31.4123  35.1694   3.7

...as you can see we deal with roughly 200,000 pkgs in about 36 seconds
which isn't that bad, but we are losing a lot of the time above creating
packages that "we know" we'll never need to look at.
 So I did the "obvious" change of pushing the logic down into the
sacks/repos. of which pkgs we've already loaded so we can take advantage
of it. This is roughly similar to other tools that build a single giant
"master repo." from all the component repos. ... except it's dynamic.
Anyway that change gives (re-pasting old):

name                                      beg      end     ~diff
------------------------------------------------------------------
old                                       0       36.086   36
    RepoStorage.populateSack              0.5855   2.5386   2
    YumBaseCli.updatePkgs                 4.5586  31.4100  27
        MetaSack.simplePkgList            5.1710  18.1303  13
        Updates.__init__                 18.1303  20.5608   2.4
        MetaSack.returnObsoletes         20.5612  29.3818   8.8
            MetaSack.returnNewestByName  21.5447  29.2674   7.7
    depsolver-ish[1]                     31.4123  35.1694   3.7

new                                       0       18.543   18.5
    RepoStorage.populateSack              0.5830   2.5420   2
    YumBaseCli.updatePkgs                 4.5561  14.4253  10
        MetaSack.simplePkgList            5.1786   8.4666   3.3
        Updates.__init__                  8.4667   8.6652   0.2
        MetaSack.returnObsoletes          8.9373  13.5868   4.5
            MetaSack.returnNewestByName  10.0353  13.5678   3.5
    depsolver-ish[1]                     14.4282  17.8390   3.4

...now as I said above, MetaSack.simplePkgList will get bigger again if
most of the packages have different versions (probable?) but I'm not
sure we actually need simplePkgList() in updatePkgs() and not just
returnNewestByNameArch() turned into tuples (which would give us roughly
the same numbers as above, I think).
 Also we only gain if we load the repos. with the newest pkgs before
those with older versions ... so we might need to "optimise" that.

 Now putting this into HEAD after everyone has played with it is
probably a no brainer (please shout if you have some objection ... see
the code below).

 However what about the 3.2.x branch?
 The only real argument against doing so is that it might break
something, as it's completely API compatible.

 Anyway the "obvious" change was roughly (and dito. a loadNewestByName):

 packageSack.py:MetaSack:

+    def loadPackages(self, pkgs, patterns=None):
+        """load list of newest packages based on name matching
+           this means(in name.arch form): foo.i386 and foo.noarch will
+           be compared to each other for highest version"""
+
+        for sack in self.sacks.values():
+            try:
+                sack.loadPackages(pkgs, patterns)
+            except PackageSackError:
+                continue

     def simplePkgList(self, patterns=None):
         """returns a list of pkg tuples (n, a, e, v, r)"""
+
+        if True:
+            pkgs = {}
+            self.loadPackages(pkgs, patterns)
+            return [pkg.pkgtup for pkg in pkgs.values()]
+
         return self._computeAggregateListResult("simplePkgList", patterns)

 sqlitesack.py:

+    @catchSqliteException
+    def _loadPkgsHlpr(self, pkgs, pkg_filter, patterns=None):
+        # Skip unused repos completely, Eg. *-source
+        skip_all = True
+        for repo in self.added:
+            if repo not in self._all_excludes:
+                skip_all = False
+
+        if skip_all:
+            return []
+
+        if hasattr(self, 'pkgobjlist'):
+            pkgobjlist = self.pkgobjlist
+        else:
+            pkgobjlist = self._buildPkgObjList(None, patterns, pkg_filter)
+
+        return pkgobjlist
+
+    def loadPackages(self, pkgs, patterns=None):
+        """ Like returnPackages(), except we load into the pkgs param. """
+
+        def _filt_pkg(x):
+            """ Returns True if we don't need to bother loading this pkg. """
+            nevra = (x['name'], x['epoch'],x['version'],x['release'], x['arch']
+            if nevra not in pkgs:
+                return False
+            return True
+        
+        for po in self._loadPkgsHlpr(pkgs, _filt_pkg, patterns):
+            nevra = (po.name, po.epoch, po.version, po.release, po.arch)
+            if nevra in pkgs or self._pkgExcluded(po):
+                continue
+            pkgs[nevra] = po
+    

[...]

@@ -941,10 +1015,10 @@ class YumSqlitePackageSack(yumRepo.YumPackageSack):
         return exactmatch, matched, unmatched
 
     @catchSqliteException
-    def _buildPkgObjList(self, repoid=None, patterns=None):
+    def _buildPkgObjList(self, repoid=None, patterns=None, pkg_filter=None):
@@ -968,16 +1042,22 @@ class YumSqlitePackageSack(yumRepo.YumPackageSack):
                     qsql = _FULL_PARSE_QUERY_BEG + " OR ".join(pat_sqls)
                 executeSQL(cur, qsql, pat_data)
                 for x in cur:
+                    if pkg_filter is not None and pkg_filter(x):
+                        continue

-- 
James Antill <james.antill at redhat.com>
Red Hat
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.baseurl.org/pipermail/yum-devel/attachments/20080520/d20fc9bf/attachment.pgp 


More information about the Yum-devel mailing list