Compressed Unicode resource format

This page describes the compressed resource file format introduced from Symbian OS v7.0.

This format compresses, for those cases where actual benefit is yielded by compressing, Unicode text-strings in the resource data by using the Standard Compression Scheme for Unicode, described in http://www.unicode.org/unicode/reports/tr6/tr6-3.2.html.

Resource files in this format are generated by the resource compiler from Symbian OS v7.0.

Number of bytes

Description

16

These bytes store the resource file’s UIDs.

The first twelve bytes consist of three four-byte integers (in little-endian byte order) followed by a four-byte CRC checksum generated from those three integers.

The first UID is always 0x101f4a6b.

The second and third UIDs can be specified on rcomp's command-line. By default, the second UID is zero and the third UID is the resource file’s “offset”, i.e. the twenty-bit integer generated from the resource file’s name. These twenty bits are stored in the least significant twenty bits of the third UID; the most significant twelve bits are all zero.

1

This byte stores flags. Currently, there is only one flag defined, which is the lowest bit of the byte (0x01). This flag indicates whether the third UID is actually the resource file’s “offset” (see the row above). The bit is set to non-zero if it is, and zero if it is not.

2

This two-byte integer (in little-endian byte order) stores the size in bytes of the largest resource in the file (that is, the size when uncompressed).

number_of_resources/8 (rounded up to the nearest whole number)

This is a bit-array (one bit for each resource) storing which resources contain compressed Unicode. The least significant bit of the first byte corresponds to the first resource, the next-to-least significant bit of the first byte corresponds to the second resource, etc. A bit being set to non-zero indicates that the corresponding resource contains compressed Unicode, a zero bit indicates that it does not contain compressed Unicode.

[any]

This contains the data for all the resources stored in order, one after another with no byte-padding between them. The format of the data for each resource is in one of two possible formats depending on whether the resource contains compressed Unicode. The bit-array described in the row above indicates which resources contain compressed Unicode. Note that resources in either of these two formats may contain uncompressed Unicode: this is because compressing Unicode using the Standard Compression Scheme for Unicode can, in certain conditions, yield larger output than input, hence such Unicode text-strings will not be compressed as it would not be beneficial. Extra padding bytes (arbitrarily 0xab) are inserted in front of any uncompressed Unicode text-string that would otherwise not be aligned on a two-byte boundary relative to the start of that resource’s data, once the resource has been uncompressed.

Resources not containing compressed Unicode:

The binary data of these resources is laid out exactly as specified in the resource definition (although note the comment about padding bytes above).

Resources containing compressed Unicode:

The binary data of these resources is split up into one or more sequences, or “runs”, alternating between compressed Unicode and other material.

Each run is preceded by an integer containing the length in bytes of the run (not including the byte(s) it occupies itself). The run-length occupies a single byte if it is less than 128, otherwise it occupies two bytes (in little-endian byte order), with the most significant bit of the first byte set to non-zero to indicate that the run-length occupies two bytes. Only the length of the first run may be zero (which would be the case if the resource does not start with compressed Unicode).

(number_of_resources+1)*2

This is the resource index, which is a series of two-byte integers (in little-endian byte order), one for each resource in the resource file, each storing the file-position of that resource’s data (see row immediately above).

This is followed by a two-byte integer (in little-endian byte order), which is the file-position one byte past the end of the last resource’s data. This is so that working out the length of a resource’s data is trivially done by subtracting the file-position stored in that resource’s index-entry from the file-position in the next index-entry.

This last entry in the resource index, which stores the file-position one byte past the end of the last resource’s data, can also be thought of as storing the file-position of the start of the resource index.