Module urllib2
[hide private]
[frames] | no frames]

Module urllib2

An extensible library for opening URLs using a variety of protocols

The simplest way to use this module is to call the urlopen function, which accepts a string containing a URL or a Request object (described below). It opens the URL and returns the results as file-like object; the returned object has some extra methods described below.

The OpenerDirector manages a collection of Handler objects that do all the actual work. Each Handler implements a particular protocol or option. The OpenerDirector is a composite object that invokes the Handlers needed to open the requested URL. For example, the HTTPHandler performs HTTP GET and POST requests and deals with non-error returns. The HTTPRedirectHandler automatically deals with HTTP 301, 302, 303 and 307 redirect errors, and the HTTPDigestAuthHandler deals with digest authentication.

urlopen(url, data=None) -- basic usage is the same as original urllib. pass the url and optionally data to post to an HTTP URL, and get a file-like object back. One difference is that you can also pass a Request instance instead of URL. Raises a URLError (subclass of IOError); for HTTP errors, raises an HTTPError, which can also be treated as a valid response.

build_opener -- function that creates a new OpenerDirector instance. will install the default handlers. accepts one or more Handlers as arguments, either instances or Handler classes that it will instantiate. if one of the argument is a subclass of the default handler, the argument will be installed instead of the default.

install_opener -- installs a new opener as the default opener.

objects of interest: OpenerDirector --

Request -- an object that encapsulates the state of a request. the state can be a simple as the URL. it can also include extra HTTP headers, e.g. a User-Agent.

BaseHandler --

exceptions: URLError-- a subclass of IOError, individual protocols have their own specific subclass

HTTPError-- also a valid HTTP response, so you can treat an HTTP error as an exceptional event or valid response

internals: BaseHandler and parent _call_chain conventions

Example usage:

import urllib2

# set up authentication info authinfo = urllib2.HTTPBasicAuthHandler() authinfo.add_password('realm', 'host', 'username', 'password')

proxy_support = urllib2.ProxyHandler({"http" : "http://ahad-haam:3128"})

# build a new opener that adds authentication and caching FTP handlers opener = urllib2.build_opener(proxy_support, authinfo, urllib2.CacheFTPHandler)

# install it urllib2.install_opener(opener)

f = urllib2.urlopen('http://www.python.org/')


Version: 2.5

Classes [hide private]
  URLError
  HTTPError
Raised when HTTP error occurs, but also acts like non-error return
  GopherError
  Request
  OpenerDirector
  BaseHandler
  HTTPErrorProcessor
Process HTTP error responses.
  HTTPDefaultErrorHandler
  HTTPRedirectHandler
  ProxyHandler
  HTTPPasswordMgr
  HTTPPasswordMgrWithDefaultRealm
  AbstractBasicAuthHandler
  HTTPBasicAuthHandler
  ProxyBasicAuthHandler
  AbstractDigestAuthHandler
  HTTPDigestAuthHandler
An authentication protocol defined by RFC 2069
  ProxyDigestAuthHandler
  AbstractHTTPHandler
  HTTPHandler
  HTTPSHandler
  HTTPCookieProcessor
  UnknownHandler
  FileHandler
  FTPHandler
  CacheFTPHandler
  GopherHandler
Functions [hide private]
 
urlopen(url, data=None)
 
install_opener(opener)
 
request_host(request)
Return request-host, as defined by RFC 2965.
 
build_opener(*handlers)
Create an opener object from a list of handlers.
 
_parse_proxy(proxy)
Return (scheme, user, password, host/port) given a URL or an authority.
 
randombytes(n)
Return n random bytes.
 
parse_keqv_list(l)
Parse list of key=value strings where keys are not duplicated.
 
parse_http_list(s)
Parse lists as described by RFC 2068 Section 2.
Variables [hide private]
  _opener = None
  _cut_port_re = re.compile(r':\d+$')

Imports: base64, hashlib, httplib, mimetools, os, posixpath, random, re, socket, sys, time, urlparse, bisect, StringIO, unwrap, unquote, splittype, splithost, quote, addinfourl, splitport, splitgophertype, splitquery, splitattr, ftpwrapper, noheaders, splituser, splitpasswd, splitvalue, localhost, url2pathname, getproxies


Function Details [hide private]

request_host(request)

 

Return request-host, as defined by RFC 2965.

Variation from RFC: returned value is lowercased, for convenient comparison.

build_opener(*handlers)

 

Create an opener object from a list of handlers.

The opener will use several default handlers, including support for HTTP and FTP.

If any of the handlers passed as arguments are subclasses of the default handlers, the default handlers will not be used.

_parse_proxy(proxy)

 

Return (scheme, user, password, host/port) given a URL or an authority.

If a URL is supplied, it must have an authority (host:port) component. According to RFC 3986, having an authority component means the URL must have two slashes after the scheme:

>>> _parse_proxy('file:/ftp.example.com/')
Traceback (most recent call last):
ValueError: proxy URL with no authority: 'file:/ftp.example.com/'

The first three items of the returned tuple may be None.

Examples of authority parsing:

>>> _parse_proxy('proxy.example.com')
(None, None, None, 'proxy.example.com')
>>> _parse_proxy('proxy.example.com:3128')
(None, None, None, 'proxy.example.com:3128')

The authority component may optionally include userinfo (assumed to be username:password):

>>> _parse_proxy('joe:password@proxy.example.com')
(None, 'joe', 'password', 'proxy.example.com')
>>> _parse_proxy('joe:password@proxy.example.com:3128')
(None, 'joe', 'password', 'proxy.example.com:3128')

Same examples, but with URLs instead:

>>> _parse_proxy('http://proxy.example.com/')
('http', None, None, 'proxy.example.com')
>>> _parse_proxy('http://proxy.example.com:3128/')
('http', None, None, 'proxy.example.com:3128')
>>> _parse_proxy('http://joe:password@proxy.example.com/')
('http', 'joe', 'password', 'proxy.example.com')
>>> _parse_proxy('http://joe:password@proxy.example.com:3128')
('http', 'joe', 'password', 'proxy.example.com:3128')

Everything after the authority is ignored:

>>> _parse_proxy('ftp://joe:password@proxy.example.com/rubbish:3128')
('ftp', 'joe', 'password', 'proxy.example.com')

Test for no trailing '/' case:

>>> _parse_proxy('http://joe:password@proxy.example.com')
('http', 'joe', 'password', 'proxy.example.com')

parse_http_list(s)

 

Parse lists as described by RFC 2068 Section 2.

In particular, parse comma-separated lists where the elements of the list may include quoted-strings. A quoted-string could contain a comma. A non-quoted string could have quotes in the middle. Neither commas nor quotes count if they are escaped. Only double-quotes count, not single-quotes.