4.9.2 Standard Encodings

Python comes with a number of codecs builtin, either implemented as C functions, or with dictionaries as mapping tables. The following table lists the codecs by name, together with a few common aliases, and the languages for which the encoding is likely used. Neither the list of aliases nor the list of languages is meant to be exhaustive. Notice that spelling alternatives that only differ in case or use a hyphen instead of an underscore are also valid aliases.

Many of the character sets support the same languages. They vary in individual characters (e.g. whether the EURO SIGN is supported or not), and in the assignment of characters to code positions. For the European languages in particular, the following variants typically exist:

an ISO 8859 codeset
a Microsoft Windows code page, which is typically derived from a 8859 codeset, but replaces control characters with additional graphic characters
an IBM EBCDIC code page
an IBM PC code page, which is ASCII compatible

Codec Aliases Languages

ascii 646, us-ascii English
cp037 IBM037, IBM039 English
cp424 EBCDIC-CP-HE, IBM424 Hebrew
cp437 437, IBM437 English
cp500 EBCDIC-CP-BE, EBCDIC-CP-CH, IBM500 Western Europe
cp737 Greek
cp775 IBM775 Baltic languages
cp850 850, IBM850 Western Europe
cp852 852, IBM852 Central and Eastern Europe
cp855 855, IBM855 Bulgarian, Byelorussian, Macedonian, Russian, Serbian
cp856 Hebrew
cp857 857, IBM857 Turkish
cp860 860, IBM860 Portuguese
cp861 861, CP-IS, IBM861 Icelandic
cp862 862, IBM862 Hebrew
cp863 863, IBM863 Canadian
cp864 IBM864 Arabic
cp865 865, IBM865 Danish, Norwegian
cp869 869, CP-GR, IBM869 Greek
cp874 Thai
cp875 Greek
cp1006 Urdu
cp1026 ibm1026 Turkish
cp1140 ibm1140 Western Europe
cp1250 windows-1250 Central and Eastern Europe
cp1251 windows-1251 Bulgarian, Byelorussian, Macedonian, Russian, Serbian
cp1252 windows-1252 Western Europe
cp1253 windows-1253 Greek
cp1254 windows-1254 Turkish
cp1255 windows-1255 Hebrew
cp1256 windows1256 Arabic
cp1257 windows-1257 Baltic languages
cp1258 windows-1258 Vietnamese
latin_1 iso-8859-1, iso8859-1, 8859, cp819, latin, latin1, L1 West Europe
iso8859_2 iso-8859-2, latin2, L2 Central and Eastern Europe
iso8859_3 iso-8859-3, latin3, L3 Esperanto, Maltese
iso8859_4 iso-8859-4, latin4, L4 Baltic languagues
iso8859_5 iso-8859-5, cyrillic Bulgarian, Byelorussian, Macedonian, Russian, Serbian
iso8859_6 iso-8859-6, arabic Arabic
iso8859_7 iso-8859-7, greek, greek8 Greek
iso8859_8 iso-8859-8, hebrew Hebrew
iso8859_9 iso-8859-9, latin5, L5 Turkish
iso8859_10 iso-8859-10, latin6, L6 Nordic languages
iso8859_13 iso-8859-13 Baltic languages
iso8859_14 iso-8859-14, latin8, L8 Celtic languages
iso8859_15 iso-8859-15 Western Europe
koi8_r Russian
koi8_u Ukrainian
mac_cyrillic maccyrillic Bulgarian, Byelorussian, Macedonian, Russian, Serbian
mac_greek macgreek Greek
mac_iceland maciceland Icelandic
mac_latin2 maclatin2, maccentraleurope Central and Eastern Europe
mac_roman macroman Western Europe
mac_turkish macturkish Turkish
utf_16 U16, utf16 all languages
utf_16_be UTF-16BE all languages (BMP only)
utf_16_le UTF-16LE all languages (BMP only)
utf_7 U7 all languages
utf_8 U8, UTF, utf8 all languages

Codec	Aliases	Languages
ascii	646, us-ascii	English
cp037	IBM037, IBM039	English
cp424	EBCDIC-CP-HE, IBM424	Hebrew
cp437	437, IBM437	English
cp500	EBCDIC-CP-BE, EBCDIC-CP-CH, IBM500	Western Europe
cp737		Greek
cp775	IBM775	Baltic languages
cp850	850, IBM850	Western Europe
cp852	852, IBM852	Central and Eastern Europe
cp855	855, IBM855	Bulgarian, Byelorussian, Macedonian, Russian, Serbian
cp856		Hebrew
cp857	857, IBM857	Turkish
cp860	860, IBM860	Portuguese
cp861	861, CP-IS, IBM861	Icelandic
cp862	862, IBM862	Hebrew
cp863	863, IBM863	Canadian
cp864	IBM864	Arabic
cp865	865, IBM865	Danish, Norwegian
cp869	869, CP-GR, IBM869	Greek
cp874		Thai
cp875		Greek
cp1006		Urdu
cp1026	ibm1026	Turkish
cp1140	ibm1140	Western Europe
cp1250	windows-1250	Central and Eastern Europe
cp1251	windows-1251	Bulgarian, Byelorussian, Macedonian, Russian, Serbian
cp1252	windows-1252	Western Europe
cp1253	windows-1253	Greek
cp1254	windows-1254	Turkish
cp1255	windows-1255	Hebrew
cp1256	windows1256	Arabic
cp1257	windows-1257	Baltic languages
cp1258	windows-1258	Vietnamese
latin_1	iso-8859-1, iso8859-1, 8859, cp819, latin, latin1, L1	West Europe
iso8859_2	iso-8859-2, latin2, L2	Central and Eastern Europe
iso8859_3	iso-8859-3, latin3, L3	Esperanto, Maltese
iso8859_4	iso-8859-4, latin4, L4	Baltic languagues
iso8859_5	iso-8859-5, cyrillic	Bulgarian, Byelorussian, Macedonian, Russian, Serbian
iso8859_6	iso-8859-6, arabic	Arabic
iso8859_7	iso-8859-7, greek, greek8	Greek
iso8859_8	iso-8859-8, hebrew	Hebrew
iso8859_9	iso-8859-9, latin5, L5	Turkish
iso8859_10	iso-8859-10, latin6, L6	Nordic languages
iso8859_13	iso-8859-13	Baltic languages
iso8859_14	iso-8859-14, latin8, L8	Celtic languages
iso8859_15	iso-8859-15	Western Europe
koi8_r		Russian
koi8_u		Ukrainian
mac_cyrillic	maccyrillic	Bulgarian, Byelorussian, Macedonian, Russian, Serbian
mac_greek	macgreek	Greek
mac_iceland	maciceland	Icelandic
mac_latin2	maclatin2, maccentraleurope	Central and Eastern Europe
mac_roman	macroman	Western Europe
mac_turkish	macturkish	Turkish
utf_16	U16, utf16	all languages
utf_16_be	UTF-16BE	all languages (BMP only)
utf_16_le	UTF-16LE	all languages (BMP only)
utf_7	U7	all languages
utf_8	U8, UTF, utf8	all languages

A number of codecs are specific to Python, so their codec names have no meaning outside Python. Some of them don't convert from Unicode strings to byte strings, but instead use the property of the Python codecs machinery that any bijective function with one argument can be considered as an encoding.

For the codecs listed below, the result in the ``encoding'' direction is always a byte string. The result of the ``decoding'' direction is listed as operand type in the table.

Codec Aliases Operand type Purpose

base64_codec base64, base-64 byte string Convert operand to MIME base64
hex_codec hex byte string Convert operand to hexadecimal representation, with two digits per byte
idna Unicode string Implements RFC 3490. New in version 2.3. See also encodings.idna
mbcs dbcs Unicode string Windows only: Encode operand according to the ANSI codepage (CP_ACP)
palmos Unicode string Encoding of PalmOS 3.5
punycode Unicode string Implements RFC 3492. New in version 2.3.
quopri_codec quopri, quoted-printable, quotedprintable byte string Convert operand to MIME quoted printable
raw_unicode_escape Unicode string Produce a string that is suitable as raw Unicode literal in Python source code
rot_13 rot13 byte string Returns the Caesar-cypher encryption of the operand
string_escape byte string Produce a string that is suitable as string literal in Python source code
undefined any Raise an exception for all conversion. Can be used as the system encoding if no automatic coercion between byte and Unicode strings is desired.
unicode_escape Unicode string Produce a string that is suitable as Unicode literal in Python source code
unicode_internal Unicode string Return the internal represenation of the operand
uu_codec uu byte string Convert the operand using uuencode
zlib_codec zip, zlib byte string Compress the operand using gzip

Codec	Aliases	Operand type	Purpose
base64_codec	base64, base-64	byte string	Convert operand to MIME base64
hex_codec	hex	byte string	Convert operand to hexadecimal representation, with two digits per byte
idna		Unicode string	Implements RFC 3490. New in version 2.3. See also `encodings.idna`
mbcs	dbcs	Unicode string	Windows only: Encode operand according to the ANSI codepage (CP_ACP)
palmos		Unicode string	Encoding of PalmOS 3.5
punycode		Unicode string	Implements RFC 3492. New in version 2.3.
quopri_codec	quopri, quoted-printable, quotedprintable	byte string	Convert operand to MIME quoted printable
raw_unicode_escape		Unicode string	Produce a string that is suitable as raw Unicode literal in Python source code
rot_13	rot13	byte string	Returns the Caesar-cypher encryption of the operand
string_escape		byte string	Produce a string that is suitable as string literal in Python source code
undefined		any	Raise an exception for all conversion. Can be used as the system encoding if no automatic coercion between byte and Unicode strings is desired.
unicode_escape		Unicode string	Produce a string that is suitable as Unicode literal in Python source code
unicode_internal		Unicode string	Return the internal represenation of the operand
uu_codec	uu	byte string	Convert the operand using uuencode
zlib_codec	zip, zlib	byte string	Compress the operand using gzip

See About this document... for information on suggesting changes.