A collation sequence determines the sorting order of any given character set. For example, in English, the order of the alphabet is most commonly used to order a list of English words. That is, words beginning with the letter c appear before words beginning with the letter d, which appear before words beginning with the letter e and so on.
A collation sequence is associated with each database when the database is created. This sequence determines in what order sorted data is returned to users and applications, what is returned when queries use pattern matching and, in some instances, how data is stored internally.
In a computer, if no other collation sequence is enforced, the sequence derived from the machine's native character set, either ASCII or EBCDIC is used. The sorting order of these character sets derives from the internal numeric representation of each character.
In addition to the sequences derived from ASCII and EBCDIC, Ingres supports two other local collation sequences. They are:
If none of these sequences adequately fills your needs, you can write your own local collation sequence.
The multi collation sequence is based on the DEC Multinational Character Set. This character set adds several vowels with diacritical marks to the standard 7-bit ASCII character set.
Following are the comparison sequences for the multi sequence that differ from those of ASCII:
A < À < Á < Â < Ã < Ä < B
C < Ç < D < E < È < É < Ê < Ë
I < Ì < Í < Î < Ï < J
N < Ñ < O
O < Ò < Ó < Ô < Õ < Ö < Œ < P
U < Ù < Ú < Û < Ü < V
Y < Ÿ < Z < Æ< Ø < Å
a < à < á < â < ã < ä < b
c < ç < d < e < è < é < ê < ë
i < ì < í < î < ï < j
n < ñ < o
o < ò < ó < ô < õ < ö < œ < p
ss < ß < st
u < ù < ú < û < ü < v
y < ÿ < z < æ < ø < å
For example:
cote < côte < czar < cæsar
Pattern matching rules:
The Spanish collation sequence is based on the multi sequence but contains additional support for the Spanish letters ll and ch. Listed below are the comparison sequences for the Spanish collation sequence that differ from those of ASCII. Some pattern matching rules are also described.
A < À < Á <Â < Ã < Ä < B
CZ < CÅ < Ç < CH < Ch < D < E < È < É < Ê < Ë
I < Ì < Í < Î < Ï < J
LZ < LÅ < LL < LI < M
N < Ñ < O
O < Ò < Ó< Ô < Õ < Ö < Œ < P
U < Ù < Ú < Û < Ü <V
Y < Ÿ < Z <Æ< Ø < Å
a < à < á < â < ã < ä < b
cz < câ < ç < cH < ch < d < e < è < é < ê < ë
i < ì < í < î < ï < j
lz < lå < lL < ll < m
n < ñ < o
o < ò < ó < ô < õ < ö < œ < p
ss < ß < st
u < ù < ú < û < ü < v
y < ÿ < z < æ < ø < å
Examples:
loop < llama
cote < côte < czar < cæsar < chair
The pattern matching rules are:
If you have special needs that are not met by the available collation sequences, you can write your own. Ingres allows you to write a collation sequence that has any of the following characteristics:
Keep the following points in mind as you design and test your custom collation file:
Because of these problems, we suggest that you do not use information loss sequences and the hash storage structure together.
To create a customized collation sequence, follow these steps:
To define a custom collation sequence you must create a description file, which consists of a list of "instructions" that, taken as a whole, describe the collation sequence. Each instruction must appear on a separate line in the file.
The format of each instruction is:
value:string
where:
Determines the numerical weight assigned to string. (The internal numerical weight of each character determines where a character appears in the sort order.)
The value can have any of the following formats:
Instructs sorting of the specified string after the specified character and before the next higher-weighted character in the character set. For example, in the following instruction, string1 is mapped as a single character that is ordered immediately after the letter H and before I in a sorted sequence:
H+1:string1
In the following instruction, string2 sorts after string1 and before the letter I.
H+2:string2
You can specify H+1:string or Hz+1:string and both sorts in the same manner, that is, after H and before I. However, the two examples do not behave the same when pattern matching is applied. To illustrate using an example from the Spanish language, the following instruction maps CH as a single character that exists between C and D:
C+1:CH
If you ask for a pattern match using the format C%, instances of CH are not returned. The alternative, Cz+1:CH, maps CH into two characters, C and a virtual character just after z. This causes CH to match as two characters. A pattern match using the format C% finds the instances of CH.
Sorts the specified string as the equivalent of the specified charstring. For example, in the following instruction, the word tax sorts as if it were the word revenue:
revenue:tax
Gives the specified string the internal numerical weight specified by given number. The number must be between 0 and 32766. The weighting of a character in this manner is less portable than giving the character a relative weight.
Causes the specified string to be ignored when collation is performed. For example, in the following instruction, the "?"is ignored whenever collation takes place.
+*:?
When no value is specified (the instruction takes the form: string), the collation compiler ignores the instruction. Use this format to insert comments into your collation sequence. For example:
:This is a comment
Is any character or character string. An empty string causes a syntax error.
The aducompile utility compiles your description file into a binary file and installs that file as a collation sequence that can be used. You must be the installation owner to use this utility. Be sure to give your resulting collation file a unique name so that you do not overwrite any existing collation files.
Your new collation sequence is located at $II_SYSTEM/ingres/files/collation/collation_name.
Note: In UNIX, all system users must have rights to read the new collation file.