[Yum-devel] URLGrabber: help on progress indicator

Wed Oct 3 20:24:24 UTC 2012

On 10/02/2012 09:58 AM, Zdenek Pavlas wrote:
>> 	f = grabber.urlopen()
> 
> urlopen() looks like a win for small files, but the actual
> implementation downloads to a tempfile and reopens it.
> Avoid it if you can, and use urlgrab() instead.
> 

did you mean urlread()?
urlgrab() creates the file (I don't need the file)
maybe we could have a function that take an object to which to write to.
so I can do

out = open("file", "w")
loop over i:
	grabber.urlgrab(url[i], out)	# this concatenates

?

>> I dont know the total size. I can only estimate it after the first
>> file has been downloaded as they are all the same.
> 
> You don't have to specify size in most cases, as urlgrabber
> parses the Content-Length header, and passes that to the
> TextMeter.start() method.  If it does not work, server probably
> uses Transfer-encoding: chunked (usually because file is generated
> dynamically, so even the server has no size hint).
> 

Hi,

I think I did not explain myself very well.
I am trying to achieve a single nice progress bar across multiple files.
I know there are N files, more or less the same length but I do not know the total size when I
create the progress.
I attach below my current solution. I wrap a TextMeter inside my object and I estimate the size
each time "start" is called, assuming the remaining sizes are equal to the average of the files so
far. And I always call "update" with the cumulative read from the beginning.

It almost works very well but there are a couple of issues

1) each time I call start, BaseMeter resets last_amount_read to 0, so I have a jump in the rate
estimation.
I found a simple solution to add an extra parameter to "start" last_read=0, which I can use as it is
like I start to download a file, not from the begin. So I pass the total amount read so far.

2) since you suggested urlgrab() rather than urlopen() I tried it and now each time I grab a new
file I get a new line in the progress.
This is because urlopen() does not call progress.end() (which adds a new line), while urlgrab()
does. Why is that?
So, now I filter "end" and don't pass it on.

What do you think of 1)?, would you consider a patch?

====================================================================================================

class Meter(urlgrabber.progress.TextMeter):
    def __init__(self, numberOfFiles, name):
        urlgrabber.progress.TextMeter.__init__(self)
        self.numberOfFiles = numberOfFiles
        self.name = name

        self.bytesSoFar = 0
        self.filesSoFar = 0
        self.now = None

    def addFile(self, size, now):
        self.baseRead   = self.bytesSoFar
        self.bytesSoFar = self.bytesSoFar + size
        self.filesSoFar = self.filesSoFar + 1

        # assume following files have same average length as past
        estimatedTotal = self.bytesSoFar / self.filesSoFar * self.numberOfFiles

        if self.now == None:
            self.now == now

        if self.now == None:
            self.now == time.time()

        return estimatedTotal

    def start(self, filename = None, url = None, basename = None, size = None, now = None, text = None):
        estimatedTotal = self.addFile(size, now)
	# unfortunately start reset the last_amount_read to 0, so we will have a jump
        urlgrabber.progress.TextMeter.start(self, filename, url, basename, estimatedTotal, self.now,
self.name)

    def end(self, amount_read, now = None):
# do not call end as it prints a new line
	pass

    def update(self, amount_read, now = None):
        totalRead = self.baseRead + amount_read
        urlgrabber.progress.TextMeter.update(self, totalRead, now)