urllib2

The simplest way to use this module is to call the urlopen function, which accepts a string containing a URL or a Request object (described below). It opens the URL and returns the results as file-like object; the returned object has some extra methods described below.

The OpenerDirector manages a collection of Handler objects that do all the actual work. Each Handler implements a particular protocol or option. The OpenerDirector is a composite object that invokes the Handlers needed to open the requested URL. For example, the HTTPHandler performs HTTP GET and POST requests and deals with non-error returns. The HTTPRedirectHandler automatically deals with HTTP 301, 302, 303 and 307 redirect errors, and the HTTPDigestAuthHandler deals with digest authentication.

urlopen(url, data=None) -- basic usage is the same as original urllib. pass the url and optionally data to post to an HTTP URL, and get a file-like object back. One difference is that you can also pass a Request instance instead of URL. Raises a URLError (subclass of IOError); for HTTP errors, raises an HTTPError, which can also be treated as a valid response.

build_opener -- function that creates a new OpenerDirector instance. will install the default handlers. accepts one or more Handlers as arguments, either instances or Handler classes that it will instantiate. if one of the argument is a subclass of the default handler, the argument will be installed instead of the default.

Request -- an object that encapsulates the state of a request. the state can be a simple as the URL. it can also include extra HTTP headers, e.g. a User-Agent.

exceptions: URLError-- a subclass of IOError, individual protocols have their own specific subclass

HTTPError-- also a valid HTTP response, so you can treat an HTTP error as an exceptional event or valid response

# set up authentication info authinfo = urllib2.HTTPBasicAuthHandler() authinfo.add_password('realm', 'host', 'username', 'password')

# build a new opener that adds authentication and caching FTP handlers opener = urllib2.build_opener(proxy_support, authinfo, urllib2.CacheFTPHandler)

_parse_proxy(proxy)

Return (scheme, user, password, host/port) given a URL or an authority.

If a URL is supplied, it must have an authority (host:port) component. According to RFC 3986, having an authority component means the URL must have two slashes after the scheme:

>>> _parse_proxy('file:/ftp.example.com/')
Traceback (most recent call last):
ValueError: proxy URL with no authority: 'file:/ftp.example.com/'

The first three items of the returned tuple may be None.

Examples of authority parsing:

>>> _parse_proxy('proxy.example.com')
(None, None, None, 'proxy.example.com')
>>> _parse_proxy('proxy.example.com:3128')
(None, None, None, 'proxy.example.com:3128')

The authority component may optionally include userinfo (assumed to be username:password):

>>> _parse_proxy('joe:[email protected]')
(None, 'joe', 'password', 'proxy.example.com')
>>> _parse_proxy('joe:[email protected]:3128')
(None, 'joe', 'password', 'proxy.example.com:3128')

Same examples, but with URLs instead:

>>> _parse_proxy('http://proxy.example.com/')
('http', None, None, 'proxy.example.com')
>>> _parse_proxy('http://proxy.example.com:3128/')
('http', None, None, 'proxy.example.com:3128')
>>> _parse_proxy('http://joe:[email protected]/')
('http', 'joe', 'password', 'proxy.example.com')
>>> _parse_proxy('http://joe:[email protected]:3128')
('http', 'joe', 'password', 'proxy.example.com:3128')

Everything after the authority is ignored:

>>> _parse_proxy('ftp://joe:[email protected]/rubbish:3128')
('ftp', 'joe', 'password', 'proxy.example.com')

Test for no trailing '/' case:

>>> _parse_proxy('http://joe:[email protected]')
('http', 'joe', 'password', 'proxy.example.com')

Module urllib2

request_host(request)

build_opener(*handlers)

_parse_proxy(proxy)

parse_http_list(s)