[Yum-devel] Speed hit of having 10 versions of all 10,000 packages
James Antill
james.antill at redhat.com
Tue May 20 19:34:13 UTC 2008
For some obvious reasons, I decided to look at what kind of speed hit
yum takes (and where) when we start dealing with a lot of repos. which
have different versions of the same pkg.
So I did the simple expedient of creating a set of 9 extra "Fedora"
repos. with different names, this will not be accurate if we have lots
of actual different packages, but it's mostly fair for the "upgrade"
case where you enable a bunch of different "newer package" repos.
(although we'd need to tweak the upgrade code not to require all the
older versions of pkgs, see below).
Eg. "HEAD repolist" gives:
and.org-OS-updates And.org OS updates enabled : 0
and.org-james And.org enabled : 38
and.org-misc And.org misc enabled : 4
brew Brew Buildsystem for Fedora Core 8 - x86 enabled : 6
fedora Fedora 8 - x86_64 enabled : 10,657
fedora-debuginfo Fedora 8 - x86_64 - Debug enabled : 3,206
fedora-source Fedora 8 - Source enabled : 4,834
rhts Red Hat Test Suite - x86_64 - Base enabled : 93
tst-F1 Fedora [TST 1] 8 - x86_64 enabled : 10,657
tst-F1-debuginfo Fedora [TST 1] 8 - x86_64 - Debug enabled : 3,206
tst-F1-source Fedora [TST 1] 8 - Source enabled : 4,834
tst-F2 Fedora [TST 2] 8 - x86_64 enabled : 10,657
tst-F2-debuginfo Fedora [TST 2] 8 - x86_64 - Debug enabled : 3,206
tst-F2-source Fedora [TST 2] 8 - Source enabled : 4,834
tst-F3 Fedora [TST 3] 8 - x86_64 enabled : 10,657
tst-F3-debuginfo Fedora [TST 3] 8 - x86_64 - Debug enabled : 3,206
tst-F3-source Fedora [TST 3] 8 - Source enabled : 4,834
tst-F4 Fedora [TST 4] 8 - x86_64 enabled : 10,657
tst-F4-debuginfo Fedora [TST 4] 8 - x86_64 - Debug enabled : 3,206
tst-F4-source Fedora [TST 4] 8 - Source enabled : 4,834
tst-F5 Fedora [TST 5] 8 - x86_64 enabled : 10,657
tst-F5-debuginfo Fedora [TST 5] 8 - x86_64 - Debug enabled : 3,206
tst-F5-source Fedora [TST 5] 8 - Source enabled : 4,834
tst-F6 Fedora [TST 6] 8 - x86_64 enabled : 10,657
tst-F6-debuginfo Fedora [TST 6] 8 - x86_64 - Debug enabled : 3,206
tst-F6-source Fedora [TST 6] 8 - Source enabled : 4,834
tst-F7 Fedora [TST 7] 8 - x86_64 enabled : 10,657
tst-F7-debuginfo Fedora [TST 7] 8 - x86_64 - Debug enabled : 3,206
tst-F7-source Fedora [TST 7] 8 - Source enabled : 4,834
tst-F8 Fedora [TST 8] 8 - x86_64 enabled : 10,657
tst-F8-debuginfo Fedora [TST 8] 8 - x86_64 - Debug enabled : 3,206
tst-F8-source Fedora [TST 8] 8 - Source enabled : 4,834
tst-F9 Fedora [TST 9] 8 - x86_64 enabled : 10,657
tst-F9-debuginfo Fedora [TST 9] 8 - x86_64 - Debug enabled : 3,206
tst-F9-source Fedora [TST 9] 8 - Source enabled : 4,834
updates Fedora 8 - x86_64 - Updates enabled : 5,286
updates-source Fedora 8 - Updates Source enabled : 1,974
updates-testing Fedora 8 - x86_64 - Test Updates enabled : 771
repolist: 195,142
...this gives roughly the following for "echo n | upgrade" (making sure
the network wasn't involved):
name beg end ~diff
------------------------------------------------------------------
old 0 36.086 36
RepoStorage.populateSack 0.5855 2.5386 2
YumBaseCli.updatePkgs 4.5586 31.4100 27
MetaSack.simplePkgList 5.1710 18.1303 13
Updates.__init__ 18.1303 20.5608 2.4
MetaSack.returnObsoletes 20.5612 29.3818 8.8
MetaSack.returnNewestByName 21.5447 29.2674 7.7
depsolver-ish[1] 31.4123 35.1694 3.7
...as you can see we deal with roughly 200,000 pkgs in about 36 seconds
which isn't that bad, but we are losing a lot of the time above creating
packages that "we know" we'll never need to look at.
So I did the "obvious" change of pushing the logic down into the
sacks/repos. of which pkgs we've already loaded so we can take advantage
of it. This is roughly similar to other tools that build a single giant
"master repo." from all the component repos. ... except it's dynamic.
Anyway that change gives (re-pasting old):
name beg end ~diff
------------------------------------------------------------------
old 0 36.086 36
RepoStorage.populateSack 0.5855 2.5386 2
YumBaseCli.updatePkgs 4.5586 31.4100 27
MetaSack.simplePkgList 5.1710 18.1303 13
Updates.__init__ 18.1303 20.5608 2.4
MetaSack.returnObsoletes 20.5612 29.3818 8.8
MetaSack.returnNewestByName 21.5447 29.2674 7.7
depsolver-ish[1] 31.4123 35.1694 3.7
new 0 18.543 18.5
RepoStorage.populateSack 0.5830 2.5420 2
YumBaseCli.updatePkgs 4.5561 14.4253 10
MetaSack.simplePkgList 5.1786 8.4666 3.3
Updates.__init__ 8.4667 8.6652 0.2
MetaSack.returnObsoletes 8.9373 13.5868 4.5
MetaSack.returnNewestByName 10.0353 13.5678 3.5
depsolver-ish[1] 14.4282 17.8390 3.4
...now as I said above, MetaSack.simplePkgList will get bigger again if
most of the packages have different versions (probable?) but I'm not
sure we actually need simplePkgList() in updatePkgs() and not just
returnNewestByNameArch() turned into tuples (which would give us roughly
the same numbers as above, I think).
Also we only gain if we load the repos. with the newest pkgs before
those with older versions ... so we might need to "optimise" that.
Now putting this into HEAD after everyone has played with it is
probably a no brainer (please shout if you have some objection ... see
the code below).
However what about the 3.2.x branch?
The only real argument against doing so is that it might break
something, as it's completely API compatible.
Anyway the "obvious" change was roughly (and dito. a loadNewestByName):
packageSack.py:MetaSack:
+ def loadPackages(self, pkgs, patterns=None):
+ """load list of newest packages based on name matching
+ this means(in name.arch form): foo.i386 and foo.noarch will
+ be compared to each other for highest version"""
+
+ for sack in self.sacks.values():
+ try:
+ sack.loadPackages(pkgs, patterns)
+ except PackageSackError:
+ continue
def simplePkgList(self, patterns=None):
"""returns a list of pkg tuples (n, a, e, v, r)"""
+
+ if True:
+ pkgs = {}
+ self.loadPackages(pkgs, patterns)
+ return [pkg.pkgtup for pkg in pkgs.values()]
+
return self._computeAggregateListResult("simplePkgList", patterns)
sqlitesack.py:
+ @catchSqliteException
+ def _loadPkgsHlpr(self, pkgs, pkg_filter, patterns=None):
+ # Skip unused repos completely, Eg. *-source
+ skip_all = True
+ for repo in self.added:
+ if repo not in self._all_excludes:
+ skip_all = False
+
+ if skip_all:
+ return []
+
+ if hasattr(self, 'pkgobjlist'):
+ pkgobjlist = self.pkgobjlist
+ else:
+ pkgobjlist = self._buildPkgObjList(None, patterns, pkg_filter)
+
+ return pkgobjlist
+
+ def loadPackages(self, pkgs, patterns=None):
+ """ Like returnPackages(), except we load into the pkgs param. """
+
+ def _filt_pkg(x):
+ """ Returns True if we don't need to bother loading this pkg. """
+ nevra = (x['name'], x['epoch'],x['version'],x['release'], x['arch']
+ if nevra not in pkgs:
+ return False
+ return True
+
+ for po in self._loadPkgsHlpr(pkgs, _filt_pkg, patterns):
+ nevra = (po.name, po.epoch, po.version, po.release, po.arch)
+ if nevra in pkgs or self._pkgExcluded(po):
+ continue
+ pkgs[nevra] = po
+
[...]
@@ -941,10 +1015,10 @@ class YumSqlitePackageSack(yumRepo.YumPackageSack):
return exactmatch, matched, unmatched
@catchSqliteException
- def _buildPkgObjList(self, repoid=None, patterns=None):
+ def _buildPkgObjList(self, repoid=None, patterns=None, pkg_filter=None):
@@ -968,16 +1042,22 @@ class YumSqlitePackageSack(yumRepo.YumPackageSack):
qsql = _FULL_PARSE_QUERY_BEG + " OR ".join(pat_sqls)
executeSQL(cur, qsql, pat_data)
for x in cur:
+ if pkg_filter is not None and pkg_filter(x):
+ continue
--
James Antill <james.antill at redhat.com>
Red Hat
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.baseurl.org/pipermail/yum-devel/attachments/20080520/d20fc9bf/attachment.pgp
More information about the Yum-devel
mailing list