[Yum-devel] [PATCH 6/6] Implement parallel downloads for regular RPMs

Fri Jul 15 09:04:52 UTC 2011

On Thu, Jul 14, 2011 at 11:28 PM, James Antill <james at fedoraproject.org> wrote:
> On Wed, 2011-07-13 at 18:50 +0200, Zdeněk Pavlas wrote:
>> +        for i in range(n):
>> +            # need two pipes per process
>> +            A = os.pipe()
>> +            B = os.pipe()
>> +            if os.fork() == 0:
>> +                # child: reads B, writes A
>> +                os.close(A[0])
>> +                os.close(B[1])
>> +                self.downloadProcess(B[0], A[1], remote_pkgs)
>> +                os._exit(0)
>> +            # parent: reads A, writes B
>> +            os.close(A[1])
>> +            os.close(B[0])
>> +            pool[A[0]] = B[1]
>
>  Ok, so this is the heart of it. This is much better than threading it,
> and is a good "proof of concept" that we can do it ... but there are a
> few problems with solving the problem this way:
>
> 1. rpmdb/sqlite/etc. are now fork()d in N processes, and we have to make
> sure that all the code within downloadProcess() doesn't do anything
> weird with them. This scares me a lot.
>
> 2. Any global resources, like fd's open or what happens at signal time
> will need to be dealt with. This is almost certainly more pain than is
> wanted.
>
> 3. We have to make sure that all the python code in yum/urlgrabber/etc.
> below downloadProcess() doesn't do anything weird due to running in N
> procs. at once. This is almost certainly more pain than is wanted.
>
> 4. SELinux does have setcontext() but would _really_ prefer to have an
> exec() instead ... and we still have a huge amount of extra code in
> core, even if it's running in a restricted context.
>
> 5. This is pretty package specific ... we'd need a bigger, and scarier,
> patch if we want to do anything else.
>
> 6. We inherit the memory resources of yum, for all the downloaders. COW
> might help a bit here ... but this is python, not C, so I could see us
> churning through COW pages a lot more than we might expect.
>
> ...so as I said, I think it's a good POC ... you have something where
> you can measure the impact of the change, do speed tests etc.
>  But you want to look at the fork()+exec() model inside urlgrabber,
> next. And then we can look at some APIs for "sane users" ... and then
> see what we need to make it not suck to integrate it into yum.
>
> _______________________________________________
> Yum-devel mailing list
> Yum-devel at lists.baseurl.org
> http://lists.baseurl.org/mailman/listinfo/yum-devel
>

Maybe, some client/server solution, where we spawn a download server
there is doing the parallel download and communicate over stdin/stdout
with yum.
This way you dont have to share yum internals with the downloader and
you can just kill the downloader helper process if you want to abort
the downloading.

using base64 and pickle makes it easy to transfer python data
structures over stdin/stdout pipe.

Tim