[Yum] Threading the IO

Thu Feb 26 14:00:07 UTC 2004

I have been looking at adding the capability to perform IO intensive
functions in parallel within thread pools.  

There are a number of areas within yum that could take advantage of
these optimizations.   The things that jump out sections are file
download and the header check sections.

Option one--Make urlgrabber threaded. 
Advantages
--all thread stuff and thread bugs would be in urlgrabber not spread
through out yum
--yum coders (beyond Michel) would not need to be aware of threading

Disadvantages
--threading would only be available to grabber
--currently calls to urlGrabber are of the form

calc (URL, filename)
filename = grabber.urlgrab(URL, filename,  **kwargs)
doSomeThingTo(fileName)

Urlgraber would need to modified such that the parameters could be of
the form

string, string  -> use unthreaded grabber return filename

(string, string) -> use unthreaded grabber return filename

[(string,string),(string,string)]->use threaded grabber return
[(filename, successFlag)]

calls to urlgrabber would be of the format

for item in list:
     listToGrab.append(calc(URL, filename))

fileNames = grabber.urlgrablistTograb,  **kwargs))

for fileName, successFlag in fileNames:
    if successFlag = False:
         Oops()
    else
         doSomeThingTo(fileName)

Option two--create a generic thread pool
Advantage
--Any parallelizeable functions could run in pool

tp = threadPool() # create pool
tp.init() # initialize pool

for item in list:
     tp.addToInQueue(functionToDoDomeThing(item))

output = tp.cleanUp() # ensure all treads have finish and return
[outPut]

Disadvantage
--all coders will need to be aware of thread issues and resulting bug.  

David Farning