[Yum-devel] More profiling and optimization ideas

Fri Aug 10 13:53:28 UTC 2007

Hi!

I 've been looking over the code and the profiler results to get an idea
about further optimizations. I'd now guess that we can squeeze out something
between 30 and 60% (factor 1.5 - 3 speed up). But these speedups are
fragmented into may be a dozen different issues. So the question is how many
work/change/instability we want to invest for further performance improvements.

Areas where further optimization could be done:

= Performance =

SqliteSack: Move more code from using pkgId to pkgKey. This should simplify
the SQL queries and the join on pkgKey is faster. This would also require to
move the exclude mechanism to switch to pkgKey. This could may be get a
3-10% overall speed up.

Make TransactionData.matchNaevr O(1): I already have a patch but this only
speeds up big transactions with lots of depsolving which are quite rare.

Seek and destroy other O(n) methods: ??

Conversion between version string and version tuple: ??

Reduce how often pkgs and tags are loaded from the databases:

I have a little example from my "Resolve dist upgrade" test case:

1783830 function calls (1783786 primitive calls) in 10.100 CPU seconds
(This means that 1 second is 10% of the depsolving)

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
7613    0.048    0.000    2.231    0.000 packages.py:645(returnPrco)
  quite a lot queries for PRCOs from the rpmdb
4380    0.628    0.000    2.151    0.000 packages.py:651(_populatePrco)
  and we need to load them from disk in more of half of the cases
4792    0.472    0.000    0.632    0.000 packages.py:614(__init__)
  more or less for every PO created

As there are only about 900 pkgs in the rpmdb loading prcos 4380 times
sounds a bit much. Especially as we always load all the PRCOs instead of
just the needed tag. So we probably can save ~1.5s == 15% here.

4360    0.073    0.000    0.168    0.000 sqlitesack.py:36(__init__)
  sqlite db also create much more POs than updates are installed
2220    0.303    0.000    0.472    0.000 sqlitesack.py:183(returnPrco)
  but PRCO loading is not that excessive (and also each call loads a
    single tag only)

Probably some more stuff...

= Memory usage =

More important than squeezing out the last few percent of performance is 
getting memory usage under control (Especially as using less memory leads to 
better performance in languages like Python). Right now loading and 
discarding of data is done uncontrolled on an ad hoc basis what leads to the 
interesting numbers above.
I didn't have a look into the Python bindings yet. But the RpmSack keeps a 
list of all headers in the rpmdb. So I'd guess this uses a lot of memory. 
I'll try getting around that and load them only on demand (getting them from 
the db4 should be fast)
After that we have to make sure that there are no filelists or other 
unnecessary tags loaded. The current algorithm should work without filelists 
although rpmdb (rpmlib) might need them for searching for files.

The way to get the memory below 100MB for all cases is managing the PRCOs. 
This means adding central caches that keep the used ones in memory and 
discard the unused. Least Recently Used Lists (LRU lists) have shown to be 
quite efficient for this purpose. They can be placed in the proper package 
classes as class variables.

= Identity =

The current RpmSack and SqliteSack recreate POs over and over again. This 
makes it difficult to lookup, compare, find POs as we always have to use 
pkgtuples which are also not unique (same pkg in rpmdb and sqlite). From my 
experience life is easier if the Python object identity can be used. Then 
POs can be keys of dictionaries and installed and available pkgs can be 
easily distinguished. This is a comparably large change and I don't make any 
promises about performance gains (although it affects the most often done 
operations) but it can be done in small steps. But it is a goal that needs 
to be set and followed over time to be reachable.

Florian