[Yum-devel] [PATCH] Don't keep full headers in memory during transaction

Panu Matilainen pmatilai at laiskiainen.org
Thu Feb 17 17:56:18 UTC 2011


On Thu, 17 Feb 2011, seth vidal wrote:

> On Thu, 2011-02-17 at 19:09 +0200, Panu Matilainen wrote:
>> On 02/17/2011 04:17 PM, James Antill wrote:
>>> On Thu, 2011-02-17 at 11:48 +0200, Panu Matilainen wrote:
>>>> The open/close file callbacks only need NEVRA information in the "key".
>>>> Copy the necessary bits into a regular dict and pass that instead
>>>> of the entire header, the callback wont even know the difference
>>>> as the header behaves like a dict.
>>>> Saves (tens of) megabytes of memory on large transactions.
>>>> ---
>>>>   yum/depsolve.py |    6 +++++-
>>>>   1 files changed, 5 insertions(+), 1 deletions(-)
>>>
>>>   ACK.
>>
>> Oh crap... just realized this breaks anaconda and possibly something
>> else too.
>>
>> I keep forgetting how completely broken up the python callback is: the
>> header is never there unless its passed as (a part of) the package
>> "key", wasting huge amounts of memory. So anaconda (and possibly others)
>> are relying on it being a header (even testing instanceof(rpm.hdr), and
>> accessing various things (summary, size, count of 'basenames') from there.
>>
>> Anaconda can be fixed easily enough: just make it use the information
>> from package object (which it can find from the nevra info) instead of
>> the assumed header. Or we could just copy a bit more data to the
>> "minimal header" and it'd still be a big win over the full header... but
>> there's no telling what header data other API users might try to access.
>> And then there's the object type change from rpm.hdr to dict too.
>>
>> So, it's an API change for sure, far more so than I originally thought
>> it would be :-/
>>
>> One possibility is making the new behavior opt-in, such as adding a
>> fullheader=True keyword argument to populateTs() so that yum itself can
>> benefit from it now, and give callers the old compatible behavior with
>> full header until they update their code.
>>
>
> ugh - I don't think that's such a fabulous plan. Decorating the code in
> more interesting ways doesn't thrill me.

Well, it isn't exactly a whole lot of code. Something like this (untested 
but you'll get the idea):

diff --git a/yum/depsolve.py b/yum/depsolve.py
index 997c66d..b99265d 100644
--- a/yum/depsolve.py
+++ b/yum/depsolve.py
@@ -179,7 +179,7 @@ class Depsolve(object):

          return False

-    def populateTs(self, test=0, keepold=1):
+    def populateTs(self, test=0, keepold=1, fullheader=1):
          """take transactionData class and populate transaction set"""

          if self.dsCallback: self.dsCallback.transactionPopulation()
@@ -220,12 +220,18 @@ class Depsolve(object):
                          txmbr.ts_state = 'i'
                          txmbr.output_state = TS_INSTALL

-                # grab essential data for callback use in hdr-like form
-                minhdr = {}
-                for item in ['epoch', 'name', 'version', 'release', 'arch']:
-                    minhdr[item] = hdr[item]
+                # Traditionally the entire header was used as part of the
+                # pkg key, by default give this compatible behavior. However
+                # this wastes gobs of memory for no good reason, the callback
+                # really only needs nevra to locate its files.
+                if fullhdr:
+                    cbhdr = hdr
+                else:
+                    cbhdr = {}
+                    for item in ['epoch', 'name', 'version', 'release', 'arch']:
+                        cbhdr[item] = hdr[item]

-                self.ts.addInstall(hdr, (minhdr, rpmfile), txmbr.ts_state)
+                self.ts.addInstall(hdr, (cbhdr, rpmfile), txmbr.ts_state)
                  self.verbose_logger.log(logginglevels.DEBUG_1,
                      _('Adding Package %s in mode %s'), txmbr.po, txmbr.ts_state)
                  if self.dsCallback:

This "decoration" is worth tens of megabytes of saved memory on bigger 
installs/updates, me thinks its well worth those couple of lines. It's 
either something to this effect, or revert the API-breaking patch and keep 
the memory bloat until cowboys come home. This isn't something rpm can 
magically fix either - when the rpm-side callback is finally changed to 
something better it's going to require bigger changes than the above.

 	- Panu -


More information about the Yum-devel mailing list