Parallel deltarpm creation
Ian Mcleod
imcleod at redhat.com
Thu Feb 20 21:01:26 UTC 2014
On Thu, 2014-02-20 at 21:04 +0100, Phil Knirsch wrote:
> On 02/20/2014 06:26 PM, Ian Mcleod wrote:
> > Posting here at James Antill's suggestion.
> >
> > In his talk at devconf Dennis Gilmore discussed the current bottlenecks
> > in the Fedora compose/release process. One thing he mentioned was
> > deltarpm creation.
> >
> > The current upstream createrepo is single-threaded/single-process for
> > all deltarpm actions. I've written some code to allow parallel workers
> > for these tasks, similar to the multi-process workers that can be used
> > in the initial package XML parsing tasks.
> >
> > GIT -
> > https://github.com/imcleod/createrepo/tree/feature/parallel_deltas_full
> >
> > RPMS -
> > http://imcleod.fedorapeople.org/createrepo/
> >
> > The patch adds two options to the command line createrepo and the
> > associated config object:
> >
> > --delta-workers - The number of worker processes to use for delta
> > related tasks
> >
> > --max-concurrent-delta-rpm-size - The maximum total size of uncompressed
> > rpm payloads that are actively being processed by makedeltarpm at any
> > given time.
> >
> > The deltarpm documentation suggests that its peak RAM use is typically
> > typically 4x the uncompressed RPM payload size. This is consistent with
> > my experience. So, a reasonable use case is to set --delta-workers to
> > the number of CPU cores and --max-concurrent-delta-rpm-size to ~25% of
> > RAM size (or whatever quantity of memory you want to devote to the
> > parallel deltas).
> >
> > For my development stress-test-case I re-created an F20 x86_64
> > Everything repo with F19 Everything as the "old" rpm source for deltas.
> > On a 32 core test system this task ran in 8 hours with a single deltarpm
> > worker versus 20 minutes when all 32 cores were used with a concurrent
> > size limit of 16 GB. In total this creates about 32,000 drpms. So,
> > this helps.
> >
> > Thoughts?
> >
> > -Ian
> >
>
> Very cool imho!
>
> What happens if you have an rpm with more than 1GB though in your
> specific example? Would that worker then fail and simply skip creating a
> deltarpm for that?
Any individual RPMs that have payloads larger than
"--max-delta-rpm-size" are skipped. This behavior actually pre-dates
the patch.
The main work loop in the patch also tracks the sum of the payloads of
in-progress deltas. If it cannot find an unprocessed RPM that will
"fit" into the available space it blocks until an in-progress delta has
finished and then looks again.
At the moment it always tries to add the largest payload that is still
small enough to fit into the available work space. This may be
sub-optimal but it has the virtue of being simple. :-)
-Ian
>
> Thanks & regards, Phil
>
More information about the Yum-devel
mailing list