[Yum-devel] [PATCH 6/6] Implement parallel downloads for regular RPMs

James Antill james at fedoraproject.org
Thu Jul 14 21:28:43 UTC 2011

On Wed, 2011-07-13 at 18:50 +0200, Zdeněk Pavlas wrote:
> +        for i in range(n):
> +            # need two pipes per process
> +            A = os.pipe()
> +            B = os.pipe()
> +            if os.fork() == 0:
> +                # child: reads B, writes A
> +                os.close(A[0])
> +                os.close(B[1])
> +                self.downloadProcess(B[0], A[1], remote_pkgs)
> +                os._exit(0)
> +            # parent: reads A, writes B
> +            os.close(A[1])
> +            os.close(B[0])
> +            pool[A[0]] = B[1]

 Ok, so this is the heart of it. This is much better than threading it,
and is a good "proof of concept" that we can do it ... but there are a
few problems with solving the problem this way:

1. rpmdb/sqlite/etc. are now fork()d in N processes, and we have to make
sure that all the code within downloadProcess() doesn't do anything
weird with them. This scares me a lot.

2. Any global resources, like fd's open or what happens at signal time
will need to be dealt with. This is almost certainly more pain than is

3. We have to make sure that all the python code in yum/urlgrabber/etc.
below downloadProcess() doesn't do anything weird due to running in N
procs. at once. This is almost certainly more pain than is wanted.

4. SELinux does have setcontext() but would _really_ prefer to have an
exec() instead ... and we still have a huge amount of extra code in
core, even if it's running in a restricted context.

5. This is pretty package specific ... we'd need a bigger, and scarier,
patch if we want to do anything else.

6. We inherit the memory resources of yum, for all the downloaders. COW
might help a bit here ... but this is python, not C, so I could see us
churning through COW pages a lot more than we might expect.

...so as I said, I think it's a good POC ... you have something where
you can measure the impact of the change, do speed tests etc.
 But you want to look at the fork()+exec() model inside urlgrabber,
next. And then we can look at some APIs for "sane users" ... and then
see what we need to make it not suck to integrate it into yum.

More information about the Yum-devel mailing list