A character encoding is a mapping from a set of characters to their on-disk representation. jEdit can use any encoding supported by the Java platform.
Buffers in memory are always stored in UTF-16
encoding, which means each character is mapped to an integer between 0
and 65535. UTF-16
is the native encoding supported by
Java, and has a large enough range of characters to support most modern
languages.
When a buffer is loaded, it is converted from its on-disk
representation to UTF-16
using a specified
encoding.
The default encoding, used to load files for which no other
encoding is specified, can be set in the
the section called “The Encodings Pane”.
Unless you change this setting, it will be your operating system's
native encoding, for example MacRoman
on the MacOS,
windows-1252
on Windows, and
ISO-8859-1
on Unix.
An encoding can be explicitly set when opening a file in the file system browser's
> menu.Note that there is no general way to auto-detect the encoding used by a file, however in a few cases it is possible:
UTF-16
and UTF-8Y
files are auto-detected, because they begin with a certain fixed
character sequence. Note that plain UTF-8 does not mandate a
specific header, and thus cannot be auto-detected, unless the
file in question is an XML file.
Encodings used in XML files with an XML PI like the following are auto-detected:
<?xml version="1.0" encoding="UTF-8">
The encoding that will be used to save the current buffer is shown in the status bar, and can be changed in the
> dialog box. Note that changing this setting has no effect on the buffer's contents; if you opened a file with the wrong encoding and got garbage, you will need to reload it. > is an easy way.If a file is opened without an explicit encoding specified and it appears in the recent file list, jEdit will use the encoding last used when working with that file; otherwise the default encoding will be used.
While the world is slowly converging on UTF-8 and UTF-16 encodings for storing text, a wide range of older encodings are still in widespread use and Java supports most of them.
The simplest character encoding still in use is ASCII, or
“American Standard Code for Information Interchange”.
ASCII encodes Latin letters used in English, in addition to numbers
and a range of punctuation characters. Each ASCII character consists
of 7 bits, there is a limit of 128 distinct characters, which makes
it unsuitable for anything other than English text. jEdit will load
and save files as ASCII if the US-ASCII
encoding
is used.
Because ASCII is unsuitable for international use, most
operating systems use an 8-bit extension of ASCII, with the first
128 values mapped to the ASCII characters, and the rest used to
encode accents, umlauts, and various more esoteric used
typographical marks. The three major operating systems all extend
ASCII in a different way. Files written by Macintosh programs can be
read using the MacRoman
encoding; Windows text
files are usually stored as windows-1252
. In the
Unix world, the 8859_1
character encoding has
found widespread usage.
On Windows, various other encodings, referred to as
code pages and identified by number, are used
to store non-English text. The corresponding Java encoding name is
windows-
followed by the code page number, for
example windows-850
.
Many common cross-platform international character sets are
also supported; KOI8_R
for Russian text,
Big5
and GBK
for Chinese, and
SJIS
for Japanese.