[Yum] yum performance

Dimitrios Apostolou jimis at gmx.net
Wed Aug 12 19:41:25 UTC 2009


Hello list,

I have been using fedora on various machines, many of which are fairly 
old, so I'm constantly trying to remove unnecessary fat and make things 
speedier. Unfortunately when the basic package manager is slow things 
aren't looking too good.

Running only "yum help" on an 800MHz PC with fedora 11 needs about 2.2s. 
Running "yum check-update" takes more than 20s to return an empty list.
Many other yum commands are slow too, but I thought I should start with 
the simplest ones. All measurements are made after warming up, i.e. yum is 
already run once to bring its data into cache and update metadata info.

Perhaps I shouldn't even mention how yum (old version) slowness looks 
in an old sparcstation 5 running Aurora Linux. It needs hours for 
performing operations and is constantly swapping. It is the most important 
obstacle for using that distro on such machinery.


So I've been doing some profiling on yum.
As far as "yum help" is concerned, I haven't reached any important 
conclusions. Most time is consumed in ini-parsing, URL parsing and python 
module initialisations. Really way too much diverse stuff to try and 
improve something. FYI functions to look into are 
getReposFromConfig at __init__.py and readStartupConfig at config.py and object 
initialisations (__init__.py?) in general.

As far as check-update goes, _buildPkgObjList at sqlitesack.py takes by far 
the most time. The current way it works is by doing one query to sqlite 
returning all packages, and then manually parsing the result for excludes 
and converting it to python objects, all done with repetitive python code.

Is there a reason for not using a proper SQL query for returning all 
packages needed, excluding excludes? I can see the following comment:

#  Note: If we are building the pkgobjlist, we don't exclude
# here, so that we can un-exclude later on ... if that matters.

Does that matters?

If we really take advantage of sqlite and build a query returning exactly 
what we want, then why do we need to build separate python PackageObject 
list?

I attach a patch which improves a lot the time needed for check-update by 
avoiding to populate the YumSqlitePackageSack objects and by calculating 
updates only using the (n,a,e,v,r) list returned. _buildPkgObjList is not 
even used. For this simple case it works so it makes me wonder...

What do you think? Is this preliminary patch in the right direction? 
What do you propose for improving speed even further but not breaking 
existing functionality?


Thanks in advance,
Dimitris
-------------- next part --------------
diff --git a/yum/packageSack.py b/yum/packageSack.py
index b71356a..1f574b8 100644
--- a/yum/packageSack.py
+++ b/yum/packageSack.py
@@ -921,9 +921,8 @@ class PackageSack(PackageSackBase):
         """returns a list of pkg tuples (n, a, e, v, r) optionally from a single repoid"""
         
         # Don't cache due to excludes
-        return [pkg.pkgtup for pkg in self.returnPackages(patterns=patterns,
-                                                          ignore_case=False)]
-                       
+        return self.returnPackageTuples(patterns=patterns, ignore_case=False)
+
     def printPackages(self):
         for pkg in self.returnPackages():
             print pkg
diff --git a/yum/sqlitesack.py b/yum/sqlitesack.py
index 643f1f6..73ce15f 100644
--- a/yum/sqlitesack.py
+++ b/yum/sqlitesack.py
@@ -1512,7 +1512,85 @@ class YumSqlitePackageSack(yumRepo.YumPackageSack):
             self._pkgnames_loaded.update([po.name for po in returnList])
 
         return returnList
-                
+
+    def returnPackageTuples(self, ignore_case=False, patterns=None):
+        """Returns a list of n,a,e,v,r tuples with all packages minus excludes
+        """
+        
+        # TODO: sqlite GLOB is case sensitive so even though it's handy because of 
+        #	its wildcars, perhaps we should use LIKE and transform wildcards
+        def buildQuery():
+            """Build a query in the following form:
+
+SELECT name, arch, epoch, version, release FROM packages
+WHERE NOT  # NOT because the following lines give 
+           # excluded packages, but we want the opposite
+	NOT
+	(pkgName GLOB self._pkgExcluder[i][2].lower() 
+		(only if self._pkgExcluder[i][1]=="include.match")
+	)
+	AND
+	(
+		(repo = self._excludes[i][0] AND
+		pkgKey = self._excludes[i][1])
+		OR
+		repo IN (self._all_excludes[i])
+		OR
+		arch NOT IN (self._arch_allowed[i])
+		OR
+		(pkgName GLOB self._pkgExcluder[i][2].lower() 
+			(only if self._pkgExcluder[i][1]=="exclude.match")
+		)
+	)"""
+
+            import itertools
+
+            incl_vars= [ i[2].lower() for i in self._pkgExcluder if i[1]=="include.match" ]
+            incl_q1= " OR ".join( [" (name GLOB '?') "] * len(incl_vars) )
+            
+            excl_L=[]
+            # itertools.chain seems the most elegant way to flatten a nested list
+            excl_vars1= list(itertools.chain(*self._excludes))
+            excl_q1= " OR ".join( [" (repo = ? AND pkgKey = ?) "] * (len(excl_vars1)/2) )
+            if len(excl_vars1)>0:
+                excl_L+= [excl_q1]
+            excl_vars2= list(self._all_excludes)
+            excl_q2= "repo IN (" + ",".join( ["?"] * len(excl_vars2)  ) + ")"
+            if len(excl_vars2)>0:
+                excl_L+= [excl_q2]
+            excl_vars3= list(self._arch_allowed)
+            excl_q3= "arch NOT IN (" + ",".join( ["?"] * len(self._arch_allowed) ) + ")"
+            if len(excl_vars3)>0:
+                excl_L+= [excl_q3]
+            excl_vars4= [ i[2].lower() for i in self._pkgExcluder if i[1]=="exclude.match" ]
+            excl_q4= " OR ".join( [" (name GLOB ?) "] * len(excl_vars4) )
+            if len(excl_vars4)>0:
+                excl_L+= [excl_q4]
+            excl_q= " OR ".join(excl_L)
+            excl_vars= excl_vars1 + excl_vars2 + excl_vars3 + excl_vars4
+
+            q="SELECT name, arch, epoch, version, release FROM packages"
+            if len(incl_vars)>0 or len(excl_vars)>0:
+                q+= " WHERE NOT "
+                if len(incl_vars)>0:
+                    q+= " NOT (" + incl_q1 + ")"
+                    if len(excl_vars)>0:
+                        q+= " AND "
+                if len(excl_vars)>0:
+                    q+= "(" + excl_q + ")"
+
+            return q, incl_vars+excl_vars
+
+        returnList=[]
+        (q,v)= buildQuery()
+        for (repo,cache) in self.primarydb.items():
+            print repo, q, v
+            cur = cache.execute(q, v)
+            returnList.extend(cur.fetchall())
+        return [tuple(i) for i in returnList]
+            
+        
+
     def returnPackages(self, repoid=None, patterns=None, ignore_case=False):
         """Returns a list of packages, only containing nevra information. The
            packages are processed for excludes. Note that patterns is just


More information about the Yum mailing list