[Yum-devel] [PATCH] Don't keep full headers in memory during transaction
Panu Matilainen
pmatilai at laiskiainen.org
Thu Feb 17 17:56:18 UTC 2011
On Thu, 17 Feb 2011, seth vidal wrote:
> On Thu, 2011-02-17 at 19:09 +0200, Panu Matilainen wrote:
>> On 02/17/2011 04:17 PM, James Antill wrote:
>>> On Thu, 2011-02-17 at 11:48 +0200, Panu Matilainen wrote:
>>>> The open/close file callbacks only need NEVRA information in the "key".
>>>> Copy the necessary bits into a regular dict and pass that instead
>>>> of the entire header, the callback wont even know the difference
>>>> as the header behaves like a dict.
>>>> Saves (tens of) megabytes of memory on large transactions.
>>>> ---
>>>> yum/depsolve.py | 6 +++++-
>>>> 1 files changed, 5 insertions(+), 1 deletions(-)
>>>
>>> ACK.
>>
>> Oh crap... just realized this breaks anaconda and possibly something
>> else too.
>>
>> I keep forgetting how completely broken up the python callback is: the
>> header is never there unless its passed as (a part of) the package
>> "key", wasting huge amounts of memory. So anaconda (and possibly others)
>> are relying on it being a header (even testing instanceof(rpm.hdr), and
>> accessing various things (summary, size, count of 'basenames') from there.
>>
>> Anaconda can be fixed easily enough: just make it use the information
>> from package object (which it can find from the nevra info) instead of
>> the assumed header. Or we could just copy a bit more data to the
>> "minimal header" and it'd still be a big win over the full header... but
>> there's no telling what header data other API users might try to access.
>> And then there's the object type change from rpm.hdr to dict too.
>>
>> So, it's an API change for sure, far more so than I originally thought
>> it would be :-/
>>
>> One possibility is making the new behavior opt-in, such as adding a
>> fullheader=True keyword argument to populateTs() so that yum itself can
>> benefit from it now, and give callers the old compatible behavior with
>> full header until they update their code.
>>
>
> ugh - I don't think that's such a fabulous plan. Decorating the code in
> more interesting ways doesn't thrill me.
Well, it isn't exactly a whole lot of code. Something like this (untested
but you'll get the idea):
diff --git a/yum/depsolve.py b/yum/depsolve.py
index 997c66d..b99265d 100644
--- a/yum/depsolve.py
+++ b/yum/depsolve.py
@@ -179,7 +179,7 @@ class Depsolve(object):
return False
- def populateTs(self, test=0, keepold=1):
+ def populateTs(self, test=0, keepold=1, fullheader=1):
"""take transactionData class and populate transaction set"""
if self.dsCallback: self.dsCallback.transactionPopulation()
@@ -220,12 +220,18 @@ class Depsolve(object):
txmbr.ts_state = 'i'
txmbr.output_state = TS_INSTALL
- # grab essential data for callback use in hdr-like form
- minhdr = {}
- for item in ['epoch', 'name', 'version', 'release', 'arch']:
- minhdr[item] = hdr[item]
+ # Traditionally the entire header was used as part of the
+ # pkg key, by default give this compatible behavior. However
+ # this wastes gobs of memory for no good reason, the callback
+ # really only needs nevra to locate its files.
+ if fullhdr:
+ cbhdr = hdr
+ else:
+ cbhdr = {}
+ for item in ['epoch', 'name', 'version', 'release', 'arch']:
+ cbhdr[item] = hdr[item]
- self.ts.addInstall(hdr, (minhdr, rpmfile), txmbr.ts_state)
+ self.ts.addInstall(hdr, (cbhdr, rpmfile), txmbr.ts_state)
self.verbose_logger.log(logginglevels.DEBUG_1,
_('Adding Package %s in mode %s'), txmbr.po, txmbr.ts_state)
if self.dsCallback:
This "decoration" is worth tens of megabytes of saved memory on bigger
installs/updates, me thinks its well worth those couple of lines. It's
either something to this effect, or revert the API-breaking patch and keep
the memory bloat until cowboys come home. This isn't something rpm can
magically fix either - when the rpm-side callback is finally changed to
something better it's going to require bigger changes than the above.
- Panu -
More information about the Yum-devel
mailing list