Scripts

Scripts — Identifying writing systems

Synopsis




            PangoScriptIter;
enum        PangoScript;
#define     PANGO_TYPE_SCRIPT
PangoScript pango_script_for_unichar        (gunichar ch);
PangoLanguage* pango_script_get_sample_language
                                            (PangoScript script);
gboolean    pango_language_includes_script  (PangoLanguage *language,
                                             PangoScript script);
PangoScriptIter* pango_script_iter_new      (const char *text,
                                             int length);
void        pango_script_iter_get_range     (PangoScriptIter *iter,
                                             G_CONST_RETURN char **start,
                                             G_CONST_RETURN char **end,
                                             PangoScript *script);
gboolean    pango_script_iter_next          (PangoScriptIter *iter);
void        pango_script_iter_free          (PangoScriptIter *iter);

Description

The functions in this section are used to identify the writing system, or script of individual characters and of ranges within a larger text string.

Details

PangoScriptIter

typedef struct _PangoScriptIter PangoScriptIter;

A PangoScriptIter is used to iterate through a string and identify ranges in different scripts.


enum PangoScript

typedef enum {                         /* ISO 15924 code */
      PANGO_SCRIPT_INVALID_CODE = -1,
      PANGO_SCRIPT_COMMON       = 0,   /* Zyyy */
      PANGO_SCRIPT_INHERITED,          /* Qaai */
      PANGO_SCRIPT_ARABIC,             /* Arab */
      PANGO_SCRIPT_ARMENIAN,           /* Armn */
      PANGO_SCRIPT_BENGALI,            /* Beng */
      PANGO_SCRIPT_BOPOMOFO,           /* Bopo */
      PANGO_SCRIPT_CHEROKEE,           /* Cher */
      PANGO_SCRIPT_COPTIC,             /* Qaac */
      PANGO_SCRIPT_CYRILLIC,           /* Cyrl (Cyrs) */
      PANGO_SCRIPT_DESERET,            /* Dsrt */
      PANGO_SCRIPT_DEVANAGARI,         /* Deva */
      PANGO_SCRIPT_ETHIOPIC,           /* Ethi */
      PANGO_SCRIPT_GEORGIAN,           /* Geor (Geon, Geoa) */
      PANGO_SCRIPT_GOTHIC,             /* Goth */
      PANGO_SCRIPT_GREEK,              /* Grek */
      PANGO_SCRIPT_GUJARATI,           /* Gujr */
      PANGO_SCRIPT_GURMUKHI,           /* Guru */
      PANGO_SCRIPT_HAN,                /* Hani */
      PANGO_SCRIPT_HANGUL,             /* Hang */
      PANGO_SCRIPT_HEBREW,             /* Hebr */
      PANGO_SCRIPT_HIRAGANA,           /* Hira */
      PANGO_SCRIPT_KANNADA,            /* Knda */
      PANGO_SCRIPT_KATAKANA,           /* Kana */
      PANGO_SCRIPT_KHMER,              /* Khmr */
      PANGO_SCRIPT_LAO,                /* Laoo */
      PANGO_SCRIPT_LATIN,              /* Latn (Latf, Latg) */
      PANGO_SCRIPT_MALAYALAM,          /* Mlym */
      PANGO_SCRIPT_MONGOLIAN,          /* Mong */
      PANGO_SCRIPT_MYANMAR,            /* Mymr */
      PANGO_SCRIPT_OGHAM,              /* Ogam */
      PANGO_SCRIPT_OLD_ITALIC,         /* Ital */
      PANGO_SCRIPT_ORIYA,              /* Orya */
      PANGO_SCRIPT_RUNIC,              /* Runr */
      PANGO_SCRIPT_SINHALA,            /* Sinh */
      PANGO_SCRIPT_SYRIAC,             /* Syrc (Syrj, Syrn, Syre) */
      PANGO_SCRIPT_TAMIL,              /* Taml */
      PANGO_SCRIPT_TELUGU,             /* Telu */
      PANGO_SCRIPT_THAANA,             /* Thaa */
      PANGO_SCRIPT_THAI,               /* Thai */
      PANGO_SCRIPT_TIBETAN,            /* Tibt */
      PANGO_SCRIPT_CANADIAN_ABORIGINAL, /* Cans */
      PANGO_SCRIPT_YI,                 /* Yiii */
      PANGO_SCRIPT_TAGALOG,            /* Tglg */
      PANGO_SCRIPT_HANUNOO,            /* Hano */
      PANGO_SCRIPT_BUHID,              /* Buhd */
      PANGO_SCRIPT_TAGBANWA,           /* Tagb */

      /* Unicode-4.0 additions */
      PANGO_SCRIPT_BRAILLE,            /* Brai */
      PANGO_SCRIPT_CYPRIOT,            /* Cprt */
      PANGO_SCRIPT_LIMBU,              /* Limb */
      PANGO_SCRIPT_OSMANYA,            /* Osma */
      PANGO_SCRIPT_SHAVIAN,            /* Shaw */
      PANGO_SCRIPT_LINEAR_B,           /* Linb */
      PANGO_SCRIPT_TAI_LE,             /* Tale */
      PANGO_SCRIPT_UGARITIC            /* Ugar */
} PangoScript;

The PangoScript enumeration identifies different writing systems. The values correspond to the names defined in the Unicode standard. (See Unicode Standard Annex 24: Script names).

PANGO_SCRIPT_INVALID_CODEa value never used for any unicode character
PANGO_SCRIPT_COMMONa character used by multiple different scripts
PANGO_SCRIPT_INHERITEDa mark glyph that takes its script from the base glyph to which it is attached.
PANGO_SCRIPT_ARABIC
PANGO_SCRIPT_ARMENIAN
PANGO_SCRIPT_BENGALI
PANGO_SCRIPT_BOPOMOFO
PANGO_SCRIPT_CHEROKEE
PANGO_SCRIPT_COPTIC
PANGO_SCRIPT_CYRILLIC
PANGO_SCRIPT_DESERET
PANGO_SCRIPT_DEVANAGARI
PANGO_SCRIPT_ETHIOPIC
PANGO_SCRIPT_GEORGIAN
PANGO_SCRIPT_GOTHIC
PANGO_SCRIPT_GREEK
PANGO_SCRIPT_GUJARATI
PANGO_SCRIPT_GURMUKHI
PANGO_SCRIPT_HAN
PANGO_SCRIPT_HANGUL
PANGO_SCRIPT_HEBREW
PANGO_SCRIPT_HIRAGANA
PANGO_SCRIPT_KANNADA
PANGO_SCRIPT_KATAKANA
PANGO_SCRIPT_KHMER
PANGO_SCRIPT_LAO
PANGO_SCRIPT_LATIN
PANGO_SCRIPT_MALAYALAM
PANGO_SCRIPT_MONGOLIAN
PANGO_SCRIPT_MYANMAR
PANGO_SCRIPT_OGHAM
PANGO_SCRIPT_OLD_ITALIC
PANGO_SCRIPT_ORIYA
PANGO_SCRIPT_RUNIC
PANGO_SCRIPT_SINHALA
PANGO_SCRIPT_SYRIAC
PANGO_SCRIPT_TAMIL
PANGO_SCRIPT_TELUGU
PANGO_SCRIPT_THAANA
PANGO_SCRIPT_THAI
PANGO_SCRIPT_TIBETAN
PANGO_SCRIPT_CANADIAN_ABORIGINAL
PANGO_SCRIPT_YI
PANGO_SCRIPT_TAGALOG
PANGO_SCRIPT_HANUNOO
PANGO_SCRIPT_BUHID
PANGO_SCRIPT_TAGBANWA
PANGO_SCRIPT_BRAILLE
PANGO_SCRIPT_CYPRIOT
PANGO_SCRIPT_LIMBU
PANGO_SCRIPT_OSMANYA
PANGO_SCRIPT_SHAVIAN
PANGO_SCRIPT_LINEAR_B
PANGO_SCRIPT_TAI_LE
PANGO_SCRIPT_UGARITIC

PANGO_TYPE_SCRIPT

#define PANGO_TYPE_SCRIPT (pango_script_get_type())

The GObject type for PangoScript


pango_script_for_unichar ()

PangoScript pango_script_for_unichar        (gunichar ch);

Looks up the PangoScript for a particular character (as defined by Unicode Technical report 24). No check is made for ch being valid unicode character; if you pass in invalid character, the result is undefined.

ch : a unicode characters
Returns : the PangoScript for the character.

pango_script_get_sample_language ()

PangoLanguage* pango_script_get_sample_language
                                            (PangoScript script);

Given a script, finds a language tag that is reasonably representative of that script. This will usually be the most widely spoken or used language written in that script: for instance, the sample language for PANGO_SCRIPT_CYRILLIC is ru (Russian), the sample lanugage for PANGO_SCRIPT_ARABIC is ar.

For some scripts, no sample language will be returned because there is no language that is sufficiently representative. The best example of this is PANGO_SCRIPT_HAN, where various different variants of written Chinese, Japanese, and Korean all use significantly different sets of Han characters and forms of shared characters. No sample language can be provided for many historical scripts as well.

script : a PangoScript
Returns : a PangoLanguage that is representative of the script, or NULL if no such language exists.

Since 1.4


pango_language_includes_script ()

gboolean    pango_language_includes_script  (PangoLanguage *language,
                                             PangoScript script);

Determines if script is one of the scripts used to write language. The returned value is conservative; if nothing is known about the language tag language, TRUE will be returned, since, as far as Pango knows, script might be used to write language.

This routine is used in Pango's itemization process when determining if a supplied language tag is relevant to a particular section of text. It probably is not useful for applications in most circumstances.

language : a PangoLanguage
script : a PangoScript
Returns : TRUE if script is one of the scripts used to write language, or if nothing is known about language.

Since 1.4


pango_script_iter_new ()

PangoScriptIter* pango_script_iter_new      (const char *text,
                                             int length);

Create a new PangoScriptIter, used to break a string of Unicode into runs by text. No copy is made of text, so the caller needs to make sure it remains valid until the iterator is freed with pango_script_iter_free().x

text : a UTF-8 string
length : length of text, or -1 if text is NUL-terminated.
Returns : the newly created script iterator, initialized to point at the first range in the text. If the string is empty, it will point at an empty range.

pango_script_iter_get_range ()

void        pango_script_iter_get_range     (PangoScriptIter *iter,
                                             G_CONST_RETURN char **start,
                                             G_CONST_RETURN char **end,
                                             PangoScript *script);

Gets information about the range to which iter currently points. The range is the is the set of locations p where *start <= p < *end. (That is, it doesn't include the character stored at *end)

iter : a PangoScriptIter
start : location to store start position of the range, or NULL
end : location to store end position of the range, or NULL
script : location to store script for range, or NULL

pango_script_iter_next ()

gboolean    pango_script_iter_next          (PangoScriptIter *iter);

Advances a PangoScriptIter to the next range. If the iter is already at the end, it is left unchanged and FALSE is returned.

iter : a PangoScriptIter
Returns : TRUE if the iter was succesfully advanced.

pango_script_iter_free ()

void        pango_script_iter_free          (PangoScriptIter *iter);

Frees a PangoScriptIter created with pango_script_iter_new().

iter : a PangoScriptIter