18.11. rfc822 — Parse RFC 2822 mail headers
Deprecated since version 2.3: The email package should be used in preference to the rfc822
module. This module is present only to maintain backward compatibility, and
has been removed in 3.0.
This module defines a class, Message, which represents an “email
message” as defined by the Internet standard RFC 2822. Such messages
consist of a collection of message headers, and a message body. This module
also defines a helper class AddressList for parsing RFC 2822
addresses. Please refer to the RFC for information on the specific syntax of
RFC 2822 messages.
The mailbox module provides classes to read mailboxes produced by
various end-user mail programs.
-
class rfc822.Message(file[, seekable])
A Message instance is instantiated with an input object as parameter.
Message relies only on the input object having a readline() method; in
particular, ordinary file objects qualify. Instantiation reads headers from the
input object up to a delimiter line (normally a blank line) and stores them in
the instance. The message body, following the headers, is not consumed.
This class can work with any input object that supports a readline()
method. If the input object has seek and tell capability, the
rewindbody() method will work; also, illegal lines will be pushed back
onto the input stream. If the input object lacks seek but has an unread()
method that can push back a line of input, Message will use that to
push back illegal lines. Thus this class can be used to parse messages coming
from a buffered stream.
The optional seekable argument is provided as a workaround for certain stdio
libraries in which tell() discards buffered data before discovering that
the lseek() system call doesn’t work. For maximum portability, you
should set the seekable argument to zero to prevent that initial tell()
when passing in an unseekable object such as a file object created from a socket
object.
Input lines as read from the file may either be terminated by CR-LF or by a
single linefeed; a terminating CR-LF is replaced by a single linefeed before the
line is stored.
All header matching is done independent of upper or lower case; e.g.
m['From'], m['from'] and m['FROM'] all yield the same result.
-
class rfc822.AddressList(field)
- You may instantiate the AddressList helper class using a single string
parameter, a comma-separated list of RFC 2822 addresses to be parsed. (The
parameter None yields an empty list.)
-
rfc822.quote(str)
- Return a new string with backslashes in str replaced by two backslashes and
double quotes replaced by backslash-double quote.
-
rfc822.unquote(str)
- Return a new string which is an unquoted version of str. If str ends and
begins with double quotes, they are stripped off. Likewise if str ends and
begins with angle brackets, they are stripped off.
-
rfc822.parseaddr(address)
- Parse address, which should be the value of some address-containing field such
as To or Cc, into its constituent “realname” and
“email address” parts. Returns a tuple of that information, unless the parse
fails, in which case a 2-tuple (None, None) is returned.
-
rfc822.dump_address_pair(pair)
- The inverse of parseaddr(), this takes a 2-tuple of the form (realname,
email_address) and returns the string value suitable for a To or
Cc header. If the first element of pair is false, then the
second element is returned unmodified.
-
rfc822.parsedate(date)
- Attempts to parse a date according to the rules in RFC 2822. however, some
mailers don’t follow that format as specified, so parsedate() tries to
guess correctly in such cases. date is a string containing an RFC 2822
date, such as 'Mon, 20 Nov 1995 19:12:08 -0500'. If it succeeds in parsing
the date, parsedate() returns a 9-tuple that can be passed directly to
time.mktime(); otherwise None will be returned. Note that indexes 6,
7, and 8 of the result tuple are not usable.
-
rfc822.parsedate_tz(date)
- Performs the same function as parsedate(), but returns either None or
a 10-tuple; the first 9 elements make up a tuple that can be passed directly to
time.mktime(), and the tenth is the offset of the date’s timezone from UTC
(which is the official term for Greenwich Mean Time). (Note that the sign of
the timezone offset is the opposite of the sign of the time.timezone
variable for the same timezone; the latter variable follows the POSIX standard
while this module follows RFC 2822.) If the input string has no timezone,
the last element of the tuple returned is None. Note that indexes 6, 7, and
8 of the result tuple are not usable.
-
rfc822.mktime_tz(tuple)
- Turn a 10-tuple as returned by parsedate_tz() into a UTC timestamp. If
the timezone item in the tuple is None, assume local time. Minor
deficiency: this first interprets the first 8 elements as a local time and then
compensates for the timezone difference; this may yield a slight error around
daylight savings time switch dates. Not enough to worry about for common use.
See also
- Module email
- Comprehensive email handling package; supersedes the rfc822 module.
- Module mailbox
- Classes to read various mailbox formats produced by end-user mail programs.
- Module mimetools
- Subclass of rfc822.Message that handles MIME encoded messages.
18.11.1. Message Objects
A Message instance has the following methods:
-
Message.rewindbody()
- Seek to the start of the message body. This only works if the file object is
seekable.
- Returns a line’s canonicalized fieldname (the dictionary key that will be used
to index it) if the line is a legal RFC 2822 header; otherwise returns
None (implying that parsing should stop here and the line be pushed back on
the input stream). It is sometimes useful to override this method in a
subclass.
-
Message.islast(line)
- Return true if the given line is a delimiter on which Message should stop. The
delimiter line is consumed, and the file object’s read location positioned
immediately after it. By default this method just checks that the line is
blank, but you can override it in a subclass.
- Return True if the given line should be ignored entirely, just skipped. By
default this is a stub that always returns False, but you can override it in
a subclass.
- Return a list of lines consisting of all headers matching name, if any. Each
physical line, whether it is a continuation line or not, is a separate list
item. Return the empty list if no header matches name.
- Return a list of lines comprising the first header matching name, and its
continuation line(s), if any. Return None if there is no header matching
name.
- Return a single string consisting of the text after the colon in the first
header matching name. This includes leading whitespace, the trailing
linefeed, and internal linefeeds and whitespace if there any continuation
line(s) were present. Return None if there is no header matching name.
- Return a single string consisting of the last header matching name,
but strip leading and trailing whitespace.
Internal whitespace is not stripped. The optional default argument can be
used to specify a different default to be returned when there is no header
matching name; it defaults to None.
This is the preferred way to get parsed headers.
-
Message.get(name[, default])
- An alias for getheader(), to make the interface more compatible with
regular dictionaries.
-
Message.getaddr(name)
Return a pair (full name, email address) parsed from the string returned by
getheader(name). If no header matching name exists, return (None,
None); otherwise both the full name and the address are (possibly empty)
strings.
Example: If m‘s first From header contains the string
'jack@cwi.nl (Jack Jansen)', then m.getaddr('From') will yield the pair
('Jack Jansen', 'jack@cwi.nl'). If the header contained 'Jack Jansen
<jack@cwi.nl>' instead, it would yield the exact same result.
-
Message.getaddrlist(name)
This is similar to getaddr(list), but parses a header containing a list of
email addresses (e.g. a To header) and returns a list of (full
name, email address) pairs (even if there was only one address in the header).
If there is no header matching name, return an empty list.
If multiple headers exist that match the named header (e.g. if there are several
Cc headers), all are parsed for addresses. Any continuation lines
the named headers contain are also parsed.
-
Message.getdate(name)
Retrieve a header using getheader() and parse it into a 9-tuple compatible
with time.mktime(); note that fields 6, 7, and 8 are not usable. If
there is no header matching name, or it is unparsable, return None.
Date parsing appears to be a black art, and not all mailers adhere to the
standard. While it has been tested and found correct on a large collection of
email from many sources, it is still possible that this function may
occasionally yield an incorrect result.
-
Message.getdate_tz(name)
- Retrieve a header using getheader() and parse it into a 10-tuple; the
first 9 elements will make a tuple compatible with time.mktime(), and the
10th is a number giving the offset of the date’s timezone from UTC. Note that
fields 6, 7, and 8 are not usable. Similarly to getdate(), if there is
no header matching name, or it is unparsable, return None.
Message instances also support a limited mapping interface. In
particular: m[name] is like m.getheader(name) but raises KeyError
if there is no matching header; and len(m), m.get(name[, default]),
name in m, m.keys(), m.values() m.items(), and
m.setdefault(name[, default]) act as expected, with the one difference
that setdefault() uses an empty string as the default value.
Message instances also support the mapping writable interface m[name]
= value and del m[name]. Message objects do not support the
clear(), copy(), popitem(), or update() methods of the
mapping interface. (Support for get() and setdefault() was only
added in Python 2.2.)
Finally, Message instances have some public instance variables:
- A list containing the entire set of header lines, in the order in which they
were read (except that setitem calls may disturb this order). Each line contains
a trailing newline. The blank line terminating the headers is not contained in
the list.
-
Message.fp
- The file or file-like object passed at instantiation time. This can be used to
read the message content.
-
Message.unixfrom
- The Unix From line, if the message had one, or an empty string. This is
needed to regenerate the message in some contexts, such as an mbox-style
mailbox file.
18.11.2. AddressList Objects
An AddressList instance has the following methods:
-
AddressList.__len__()
- Return the number of addresses in the address list.
-
AddressList.__str__()
- Return a canonicalized string representation of the address list. Addresses are
rendered in “name” <host@domain> form, comma-separated.
-
AddressList.__add__(alist)
- Return a new AddressList instance that contains all addresses in both
AddressList operands, with duplicates removed (set union).
-
AddressList.__iadd__(alist)
- In-place version of __add__(); turns this AddressList instance
into the union of itself and the right-hand instance, alist.
-
AddressList.__sub__(alist)
- Return a new AddressList instance that contains every address in the
left-hand AddressList operand that is not present in the right-hand
address operand (set difference).
-
AddressList.__isub__(alist)
- In-place version of __sub__(), removing addresses in this list which are
also in alist.
Finally, AddressList instances have one public instance variable:
-
AddressList.addresslist
- A list of tuple string pairs, one per address. In each member, the first is the
canonicalized name part, the second is the actual route-address ('@'-separated username-host.domain pair).
Footnotes