[Yum-devel] urlgrabber socket timeouts
Ryan Tomayko
rtomayko at gmail.com
Fri Oct 8 08:00:52 UTC 2004
Okay, here's what we have. Looking for feedback on whether timeouts should
trigger retries by default.
Synopsis
--------
When calling module level methods:
import urlgrabber
# grab with a timeout of ten seconds:
urlgrabber.urlgrab('http://example.com/bla.tar.gz', localfile,
timeout=10.0)
When using the URLGrabber class:
from urlgrabber.grabber import URLGrabber
grabber = URLGrabber(timeout=10.0)
grabber.urlgrab('http://example.com/bla.tar.gz', localfile)
Overview
--------
The timeout option is a positive float expressing the number of seconds
to wait for socket operations. If the value is None or 0.0, socket
operations will block forever.
Setting this option causes urlgrabber to call the setdefaulttimeout method
on the socket module before creating the request. See the Python
documentation on settimeout/setdefaulttimeout [1] for more information.
The timeout option is silently ignored on Python versions below 2.3. An
application can determine whether timeout is supported by checking the value
of the urlgrabber.grabber.have_socket_timeout attribute.
[1]: http://www.python.org/doc/current/lib/socket-objects.html
URLGrabError and Retry
----------------------
A new error code (12) has been added to URLGrabError. This error code
signals that a socket.timeout exception has occured.
The default set of retry codes does not include this error by
default. Automatic retry _is_ supported after a timeout, however. Supply a
retrycodes= option containing the retry code 12 to turn this on.
- Ryan
On Fri, 8 Oct 2004 02:53:45 -0400, Ryan Tomayko <rtomayko at gmail.com> wrote:
> I looked into this a bit. There are two methods for establishing a
> timeout. First, there's an instance method for sockets: settimeout.
> This lets you set the timeout on a socket after it is created. The
> second is, as you mentioned, the global socket.setdefaulttimeout,
> which sets the timeout globally for all sockets created after the call
> is made.
>
> I was hoping to use the instance level method because it seems a bit
> more safe to me. We can control exactly which sockets get a timeout
> set and which do not. This proved to be very very hard. The actual
> socket creation is buried deep in httplib.py and ftplib.py. urllib2
> doesn't really care much about the socket so it doesn't expose it up
> to the calling application. So, unless urllib2 is enhanced or we
> rethink how we're opening connections, socket-level timeouts are a bit
> beyond us.
>
> Now, since we are completely single threaded at this point, it may be
> possible to use the global setdefaulttimeout to get the exact same
> effect as the instance level settimeout. e.g. the following two snips
> should yield equivelant results if only a single thread is creating
> sockets at a time:
>
> Snip 1: Using settimeout
>
> import socket
> sock = socket.socket(..)
> sock.settimeout(10.0)
> sock.connect(..)
>
> Snip 2: Using setdefaulttimeout
>
> import socket
> old_to = socket.getdefaulttimeout()
> socket.setdefaulttimeout(10.0)
> try:
> sock = sock.socket(...)
> finally:
> socket.setdefaulttimeout(old_to)
> sock.connect()
>
> Using setdefaulttimeout is a bit more of a kludge but believe me, this
> is much less kludgy than trying to inject Handler and HTTPConnection
> subclasses into urllib2.
>
> I'll commit something with this shortly, just wanted to dump my
> findings out here to see if anyone can spot something I'm missing.
>
> Ryan
>
>
>
> On Wed, 29 Sep 2004 01:44:47 -0400, seth vidal <skvidal at phy.duke.edu> wrote:
> >
> > > You're not in left field at all. The only issue is that this is only
> > > available in 2.3. In previous versions, you'd need to include a third
> > > party module.
> >
> > Works for me - let's do a check for the function, if it's not there,
> > then it doesn't get set, sucks to be running python 2.2. :)
> >
> > > I agree that this is a good way to go. It doesn't completely solve
> > > timeout problems though because tcp timeouts won't always save you
> > > from a stupid/slow server. A tcp timeout happens when one side sends
> > > a request but doesn't get an acknowledgement. It's quite possible
> > > (and common in ssh or imap connections) to simply have no requests
> > > made for a very long time. In that case, you'd also need higher-level
> > > timeouts, which I was looking into a couple weeks ago before I got
> > > swamped in work.
> >
> > My tests:
> >
> > set the timeout to 30s
> > start downloading something big
> > login to server, set -j DROP on the iptables from the downloading host
> > wait 30s, it times out
> > restart the download
> > login to server, set -j DROP on the iptables from the downloading host
> > wait 20s
> > unset the -j DROP
> > it continues the download
> > set the -j DROP
> > wait 30s, it times out.
> >
> > Repeat the above with -j REJECT and turning off webserver.
> > Now I know that's not all of the possible situations but I'd be willing
> > to bet it's a good hunk of them.
> >
> > >
> > > In short, I'd have no problem implementing this as an "only >=2.3"
> > > feature. However, we should probably be clear from the start that we
> > > will eventually remove the checks, meaning: if you want to use new
> > > urlgrabbers with old pythons for a long time, you should simply not
> > > use this option.
> >
> > Fine by me.
> >
> > This would make a lot of people happy, I'm certain.
> > -sv
> >
> >
> >
> >
> > _______________________________________________
> > Yum-devel mailing list
> > Yum-devel at lists.linux.duke.edu
> > https://lists.dulug.duke.edu/mailman/listinfo/yum-devel
> >
>
More information about the Yum-devel
mailing list