engine/core/unicode.h File Reference
#include "platform/types.h"
|
Functions |
UTF16 * | convertUTF8toUTF16 (const UTF8 *unistring) |
| Functions that convert buffers of unicode code points, allocating a buffer.
|
UTF32 * | convertUTF8toUTF32 (const UTF8 *unistring) |
UTF8 * | convertUTF16toUTF8 (const UTF16 *unistring) |
UTF32 * | convertUTF16toUTF32 (const UTF16 *unistring) |
UTF8 * | convertUTF32toUTF8 (const UTF32 *unistring) |
UTF16 * | convertUTF32toUTF16 (const UTF32 *unistring) |
const U32 | convertUTF8toUTF16 (const UTF8 *unistring, UTF16 *outbuffer, U32 len) |
| Functions that convert buffers of unicode code points, into a provided buffer.
|
const U32 | convertUTF8toUTF32 (const UTF8 *unistring, UTF32 *outbuffer, U32 len) |
const U32 | convertUTF16toUTF8 (const UTF16 *unistring, UTF8 *outbuffer, U32 len) |
const U32 | convertUTF16toUTF32 (const UTF16 *unistring, UTF32 *outbuffer, U32 len) |
const U32 | convertUTF32toUTF8 (const UTF32 *unistring, UTF8 *outbuffer, U32 len) |
const U32 | convertUTF32toUTF16 (const UTF32 *unistring, UTF16 *outbuffer, U32 len) |
const UTF32 | oneUTF8toUTF32 (const UTF8 *codepoint, U32 *unitsWalked=NULL) |
| Functions that converts one unicode codepoint at a time
- Since these functions are designed to be used in tight loops, they do not allocate buffers.
|
const UTF32 | oneUTF16toUTF32 (const UTF16 *codepoint, U32 *unitsWalked=NULL) |
const UTF16 | oneUTF32toUTF16 (const UTF32 codepoint) |
const U32 | oneUTF32toUTF8 (const UTF32 codepoint, UTF8 *threeByteCodeunitBuf) |
const U32 | dStrlen (const UTF16 *unistring) |
| Functions that calculate the length of unicode strings.
|
const U32 | dStrlen (const UTF32 *unistring) |
const UTF8 * | getNthCodepoint (const UTF8 *unistring, const U32 n) |
| Functions that scan for characters in a utf8 string.
|
Function Documentation
UTF16* convertUTF8toUTF16 |
( |
const UTF8 * |
unistring |
) |
|
Functions that convert buffers of unicode code points, allocating a buffer.
- These functions allocate their own return buffers. You are responsible for calling delete[] on these buffers.
- Because they allocate memory, do not use these functions in a tight loop.
- These are usefull when you need a new long term copy of a string.
UTF32* convertUTF8toUTF32 |
( |
const UTF8 * |
unistring |
) |
|
UTF8* convertUTF16toUTF8 |
( |
const UTF16 * |
unistring |
) |
|
UTF32* convertUTF16toUTF32 |
( |
const UTF16 * |
unistring |
) |
|
UTF8* convertUTF32toUTF8 |
( |
const UTF32 * |
unistring |
) |
|
UTF16* convertUTF32toUTF16 |
( |
const UTF32 * |
unistring |
) |
|
const U32 convertUTF8toUTF16 |
( |
const UTF8 * |
unistring, |
|
|
UTF16 * |
outbuffer, |
|
|
U32 |
len | |
|
) |
| | |
Functions that convert buffers of unicode code points, into a provided buffer.
- These functions are useful for working on existing buffers.
- These cannot convert a buffer in place. If unistring is the same memory as outbuffer, the behavior is undefined.
- The converter clamps output to the BMP (Basic Multilingual Plane) .
- Conversion to UTF-8 requires a buffer of 3 bytes (U8's) per character, + 1.
- Conversion to UTF-16 requires a buffer of 1 U16 (2 bytes) per character, + 1.
- Conversion to UTF-32 requires a buffer of 1 U32 (4 bytes) per character, + 1.
- UTF-8 only requires 3 bytes per character in the worst case.
- Output is null terminated. Be sure to provide 1 extra byte, U16 or U32 for the null terminator, or you will see truncated output.
- If the provided buffer is too small, the output will be truncated.
const U32 convertUTF8toUTF32 |
( |
const UTF8 * |
unistring, |
|
|
UTF32 * |
outbuffer, |
|
|
U32 |
len | |
|
) |
| | |
const U32 convertUTF16toUTF8 |
( |
const UTF16 * |
unistring, |
|
|
UTF8 * |
outbuffer, |
|
|
U32 |
len | |
|
) |
| | |
const U32 convertUTF16toUTF32 |
( |
const UTF16 * |
unistring, |
|
|
UTF32 * |
outbuffer, |
|
|
U32 |
len | |
|
) |
| | |
const U32 convertUTF32toUTF8 |
( |
const UTF32 * |
unistring, |
|
|
UTF8 * |
outbuffer, |
|
|
U32 |
len | |
|
) |
| | |
const U32 convertUTF32toUTF16 |
( |
const UTF32 * |
unistring, |
|
|
UTF16 * |
outbuffer, |
|
|
U32 |
len | |
|
) |
| | |
Functions that converts one unicode codepoint at a time
- Since these functions are designed to be used in tight loops, they do not allocate buffers.
- oneUTF8toUTF32() and oneUTF16toUTF32() return the converted Unicode code point in *codepoint, and set *unitsWalked to the # of code units *codepoint took up. The next Unicode code point should start at *(codepoint + *unitsWalked).
- oneUTF32toUTF8() requires a 3 byte buffer, and returns the # of bytes used.
const UTF16 oneUTF32toUTF16 |
( |
const UTF32 |
codepoint |
) |
|
const U32 oneUTF32toUTF8 |
( |
const UTF32 |
codepoint, |
|
|
UTF8 * |
threeByteCodeunitBuf | |
|
) |
| | |
const U32 dStrlen |
( |
const UTF16 * |
unistring |
) |
|
Functions that calculate the length of unicode strings.
- Since calculating the length of a UTF8 string is nearly as expensive as converting it to another format, a dStrlen for UTF8 is not provided here.
- If *unistring does not point to a null terminated string of the correct type, the behavior is undefined.
const U32 dStrlen |
( |
const UTF32 * |
unistring |
) |
|
const UTF8* getNthCodepoint |
( |
const UTF8 * |
unistring, |
|
|
const U32 |
n | |
|
) |
| | |
Functions that scan for characters in a utf8 string.
- this is useful for getting a character-wise offset into a UTF8 string, as opposed to a byte-wise offset into a UTF8 string: foo[i]
|