[Rpm-metadata] createrepo: huge memory consumption?

Ryan Lynch ryan.b.lynch at gmail.com
Mon Aug 24 12:22:41 UTC 2009


My python is decent, and I have already been poking around in the
code. I will see if I can come up with something and put some patches
together.

In the meantime, I suppose I can go without the '--update' option. Or,
maybe I could split my repos up into smaller chunks, since the peak
memory usage seems to be ~ proportional to the package count.

BTW: I have a couple of patches, now, that I wrote while trying to
log, parse, and handle 'cobbler reposync' with large numbers
directories. One affects output formatting, to make logging and
parsing easier, and the other helps to handle an annoying error:

 - adding a '--log-friendly' option: eliminates the '\r'
terminal-control magic chars; prints the individual package names, one
per line, instead;

 - adding a '--remove-on-error': if RPM cannot parse a package file
(which normally raises an exception), delete the offending file before
barfing, so subsequent 'reposync' runs will re-download it;

I think they could be helpful in automated situations, like 'cobbler'.
I had much frustration: My sync cronjobs were constantly breaking
down, and I wan't getting useful logs, so I made these changes.

Any interest in either of these functions?

-Ryan


On 2009-08-23, Seth Vidal <skvidal at fedoraproject.org> wrote:
>
>
> On Sun, 23 Aug 2009, Ryan Lynch wrote:
>
>> I've noticed that running 'createrepo' against large repositories
>> takes up an enormous amount of memory, on my machines, at least.
>>
>> I'm currently using the following options:
>>
>> * --cachedir cache
>> * --database
>> * --update
>> * --unique-md-filenames
>>
>> I see createrepo's 'genpkgmetadata.py' script eating up ~1.5 GB of
>> memory, or 37% physical memory (runtime ~1 min).  I wouldn't have
>> noticed this, except that it usually triggers the OOM killer and
>> randomly whacks firefox, eclipse, etc.
>>
>> So, is this normal/expected?  Are other users seeing this happen with
>> large repositories?  I've noticed that dropping the '--update' option
>> takes a lot longer to run (~8-10 min), but with a much lower peak
>> memory utilization.  Is it just a case of weighing memory usage vs.
>> runtime, and making my choice?
>>
>> Thanks,
>> Ryan
>
> --update reads in ALL of the old metadata into memory so it can add/delete
> from it and dump it back out quickly.
>
> the speed up you get from reading that in is at the cost of memory.
>
> code to improve that is welcome.
>
> -sv
>
> _______________________________________________
> Rpm-metadata mailing list
> Rpm-metadata at lists.baseurl.org
> http://lists.baseurl.org/mailman/listinfo/rpm-metadata
>


-- 
Ryan B. Lynch
ryan.b.lynch at gmail.com


More information about the Rpm-metadata mailing list