URIs use some characters for special purposes in defining their syntax, these are called reserved characters. For example, - ; / ? : & = . When these characters are not used in their special role inside a URI, they need to be encoded.
The following lists the reserved characters for different URI
components as defined in TEscapeMode
:
|
Some characters present the possibility of being misunderstood within URIs for various reasons. These are called unsafe characters and must always be encoded. For example, '#' character is used in URIs to indicate where a fragment identifier (bookmarks/anchors in HTML) begins.
Data characters that are allowed in a URI but do not have a reserved purpose are called "unreserved" characters. These include upper and lower case letters, decimal digits, a limited set of punctuation marks and symbols, ASCII control characters which are not printable. For example, the ISO-8859-1 (ISO-Latin) character ranges 00-1F hex (0-31 decimal) and 7F (127 decimal).
EscapeUtils
escape encodes
and decodes unsafe data in URI. It also supports converting of Unicode data
(16-bit descriptor) into UTF8 data (8-bit descriptor) and vice-versa.
EscapeUtils
provides the following functionality.
EscapeUtils::EscapeDecodeL()
escape decodes the
data.
_LIT(KEscapeEncoded, %20%3C%3E%23%25%22%7B%7D%7C%5C%5E%60); //data to decode
HBufC16* decode = EscapeUtils::EscapeDecodeL(KEscapeEncoded); //contains <>#%\"{}|\\^`
CleanupStack::PushL(decode);
...........................
CleanupStack::PopAndDestroy(decode);
This code escape decodes the data '%20%3C%3E%23%25%22%7B%7D%7C%5C%5E%60' to '<>#%\"{}|\\^'.
The URI must be split into its components before the escaped characters within the components are safely decoded.
URI encoding of a character consists of a "%" symbol, followed by two hexadecimal digits representing the octet code. For example, Space = decimal code point 32 in the ISO-Latin set. 32 decimal = 20 in hexadecimal. The URI encoded representation will be "%20"
EscapeUtils::EscapeEncodeL()
escape encodes the
invalid and reserved characters in the data as escape triples. The reserved
characters and the set of excluded characters specified by
RFC 2396 (refer to the above
table) form the entire set of excluded data.
The code fragment checks for the invalid and reserved characters in the authority component of a URI and returns the string with these characters escape encoded. For other modes as defined in TEscapeMode, refer to the table above.
HBufC16* encode = EscapeUtils::EscapeEncodeL(*decode, EscapeUtils::EEscapeAuth);
CleanupStack::PushL(encode);//encode contains %20%3C%3E%23%25%22%7B%7D%7C%5C%5E%60
......//use encode here
CleanupStack::PopAndDestroy(encode);
Escape encoding is ideal during creation of URI from the components.
EscapeUtils::ConvertFromUnicodeToUtf8L()
converts the Unicode data into UTF8 format.
_LIT16(KUnicode, "Unicode string"); //data to be converted
HBufC8* utf8 = EscapeUtils::ConvertFromUnicodeToUtf8L(KUnicode);
utf8
contains the UTF8 form of the string.
EscapeUtils::ConvertToUnicodeFromUtf8L()
converts
the data from UTF8 format to Unicode.
_LIT8(KUtf8, "UTF-8 string"); // UTF8 string to be converted
HBufC16* unicode = EscapeUtils::ConvertToUnicodeFromUtf8L(KUtf8); // convert the srting to Unicode
unicode
contains the Unicode form of the string.
Call EscapeUtils::IsEscapeTriple
to check if the
input data contains an escape triple. For example, %2a. If there is a triple,
its value is calculated and returned through the output argument
HexVal
. If there is no escape triple, then this argument is left
unchanged.
_LIT(KEscapeTriple1, "%2a"); // input data containing escape triple
TInt KEscapeTriple1_value = 0x2a;
TInt HexVal;
EscapeUtils::IsEscapeTriple(KEscapeTriple1,HexVal); // escape triple value
//variable HexVal contains value 0x2a
The code above returns '42' , the value of escape triple.