[Yum-devel] URLGrabber and escaping characters in http request

Mon Sep 24 04:16:00 UTC 2012

On Sun, Sep 23, 2012 at 09:14:20PM +0100, andrea wrote:
> Hi,
> 
> I'm using URLGrabber in Fedora 17.
> 
> I want to get this url
> 
> http://rai-i.akamaihd.net/i/20120920/unpostoalsole-2009201220.35.00_,600,800,1200,1500,.mp4.csmil/master.m3u8
> 
> But URLGrabber instead tries to get
> 
> http://rai-i.akamaihd.net/i/20120920/unpostoalsole-2009201220.35.00_%2C600%2C800%2C1200%2C1500%2C.mp4.csmil/master.m3u8
> 
> which fails.
> 
> I got that from wireshark.
> 
> If I try to use wget or curl, they pass the url unescaped which then works fine.
> 
> Any idea how to make it work.
> 
Reading the RFC for URIs ( http://www.ietf.org/rfc/rfc2396.txt ), the server
probably should unescape %2C to be a comma.  However, you can probably work
around the server's problem with something like this:


import urllib
from urlgrabber.grabber import URLParser, URLGrabber

myurl = 'http://rai-i.akamaihd.net/i/20120920/unpostoalsole-2009201220.35.00_,600,800,1200,1500,.mp4.csmil/master.m3u8'
class MyParser(URLParser):
    def quote(self, parts):
        print 'here'
        (scheme, host, path, parm, query, frag) = parts
        path = urllib.quote(path, safe='/,')
        return (scheme, host, path, parm, query, frag)

def test(url=myurl):.
    mygrabber = URLGrabber()
    mygrabber.opts.urlparser = MyParser()
    mygrabber.urlgrab(url)

-Toshio
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.baseurl.org/pipermail/yum-devel/attachments/20120923/4a1e28eb/attachment.asc>