[Yum-devel] urlgrabber caching dns

Michael Stenner mstenner at linux.duke.edu
Sun Mar 28 20:31:15 UTC 2004


On Sun, Mar 28, 2004 at 03:14:04PM -0500, Ryan Tomayko wrote:
> On Sat, 2004-03-27 at 15:22 -0500, seth vidal wrote:
> > Hey,
> >  I was wondering if urlgrabber is caching dns look ups internally
> > anywhere? 
> 
> There's definitely not any name caching at the urlgrabber level today.
> We would have to dig into urllib2 or maybe even python's low level
> socket stuff to see if this is happening at all. 
> 
> I thought name caching was handled at the kernel level in most
> circumstances? (NOTE: I'm out of my realm of real knowledge here, just
> regurgitating stuff I've read.)

No.  I did some checking yesterday, and it's doing a dns lookup each
time.  I have some ideas about how to deal with this, but they're a
bit hackish.  They involve pushing wrapping functions into the socket
module that do the caching.

> > I noticed when I had a fairly dumb resolv.conf configuration
> > the other day that the lookup time for each connection was very slow and
> > it repeated for each download. This surprised me so I was wondering
> > about that for the new urlgrabber.
> 
> Hmmm. Are you thinking that it was cycling through the resolv.conf
> "search" list each time or something like that? If you put IPs into yum.
> conf do you see a significant increase in name resolution time? 

I just turned on iptraf and fetched the same file over and over again :)

> Here's the thing, I can't think of where we would add name caching if we
> wanted to. We pretty much just hand URLs off to urllib2. urllib2 breaks
> the URLs down and hands the info off to httplib or ftplib to perform all
> the heavy lifting wrt resolving names and establishing connections and
> whatnot. The takeaway is that if this is a name resolution caching
> problem, it's probably out of our hands and into python core land. 

We could "push" functions into the socket module.  That is, do
something like this [possibly, exactly like this ;)]:

import socket
socket._getaddrinfo = socket.getaddrinfo
_dns_cache = {}
def _caching_getaddrinfo(*args, **kwargs):
    try:
        res = _dns_cache[(args, kwargs)]
    except KeyError:
        res = socket._getaddrinfo(*args, **kwargs)
        _dns_cache[(args, kwargs)] = res
    return res
socket.getaddrinfo = _caching_getaddrinfo

It's hackish, but would probably work.  Now, it's not at all clear
where we should put this.  Also, should we put it in its own class?
Or should it be module-level so the cache is fairly global?

When I started looking into this, I found a number of other problems
that have seemed to crop up:
  a) keepalive seems to be opening a connection every time
  b) some reget unittests are failing on 206 (partial content), which
     I don't understand.  The range handler is in there, so I'm not
     sure why it's getting picked up.
My point is, I'm trying to figure out these things right now, so not
looking closely at the dns thing.

					-Michael
-- 
  Michael D. Stenner                            mstenner at ece.arizona.edu
  ECE Department, the University of Arizona                 520-626-1619
  1230 E. Speedway Blvd., Tucson, AZ 85721-0104                 ECE 524G



More information about the Yum-devel mailing list