Namespaces | |
namespace | Internal |
Enumerations | |
enum | category { UNASSIGNED, UPPERCASE_LETTER, LOWERCASE_LETTER, TITLECASE_LETTER, MODIFIER_LETTER, OTHER_LETTER, NON_SPACING_MARK, ENCLOSING_MARK, COMBINING_SPACING_MARK, DECIMAL_DIGIT_NUMBER, LETTER_NUMBER, OTHER_NUMBER, SPACE_SEPARATOR, LINE_SEPARATOR, PARAGRAPH_SEPARATOR, CONTROL, FORMAT, PRIVATE_USE, SURROGATE, CONNECTOR_PUNCTUATION, DASH_PUNCTUATION, OPEN_PUNCTUATION, CLOSE_PUNCTUATION, INITIAL_QUOTE_PUNCTUATION, FINAL_QUOTE_PUNCTUATION, OTHER_PUNCTUATION, MATH_SYMBOL, CURRENCY_SYMBOL, MODIFIER_SYMBOL, OTHER_SYMBOL } |
Each unicode character is in one of these categories. More... | |
Functions | |
unsigned | nonascii_to_utf8 (unsigned ch, char *buf) |
Convert a single non-ASCII unicode character to UTF-8. | |
unsigned | to_utf8 (unsigned ch, char *buf) |
Convert a single unicode character to UTF-8. | |
void | append_utf8 (std::string &s, unsigned ch) |
Append the UTF-8 representation of a single unicode character to a std::string. | |
category | get_category (unsigned ch) |
Return the category which a given unicode character falls into. | |
bool | is_wordchar (unsigned ch) |
Test is a given unicode character is a letter or number. | |
bool | is_whitespace (unsigned ch) |
Test is a given unicode character is a whitespace character. | |
bool | is_currency (unsigned ch) |
Test is a given unicode character is a currency symbol. | |
unsigned | tolower (unsigned ch) |
Convert a unicode character to lowercase. | |
unsigned | toupper (unsigned ch) |
Convert a unicode character to uppercase. | |
std::string | tolower (const std::string &term) |
Convert a UTF-8 std::string to lowercase. | |
std::string | toupper (const std::string &term) |
Convert a UTF-8 std::string to uppercase. |
Each unicode character is in one of these categories.
unsigned Xapian::Unicode::nonascii_to_utf8 | ( | unsigned | ch, | |
char * | buf | |||
) |
Convert a single non-ASCII unicode character to UTF-8.
This is intended mainly as a helper method for to_utf8().
The character ch (which must be > 128) is written to the buffer buf and the length of the resultant UTF-8 character is returned.
NB buf must have space for (at least) 4 bytes.
Definition at line 34 of file utf8itor.cc.
Referenced by to_utf8().
unsigned Xapian::Unicode::to_utf8 | ( | unsigned | ch, | |
char * | buf | |||
) | [inline] |
Convert a single unicode character to UTF-8.
The character ch is written to the buffer buf and the length of the resultant UTF-8 character is returned.
NB buf must have space for (at least) 4 bytes.
Definition at line 267 of file unicode.h.
References nonascii_to_utf8().
Referenced by append_utf8().
void Xapian::Unicode::append_utf8 | ( | std::string & | s, | |
unsigned | ch | |||
) | [inline] |
category Xapian::Unicode::get_category | ( | unsigned | ch | ) | [inline] |
Return the category which a given unicode character falls into.
Definition at line 284 of file unicode.h.
References Xapian::Unicode::Internal::get_character_info(), and UNASSIGNED.
Referenced by DEFINE_TESTCASE(), is_currency(), Xapian::is_digit(), is_digit(), is_whitespace(), is_wordchar(), Xapian::should_stem(), and should_stem().
bool Xapian::Unicode::is_wordchar | ( | unsigned | ch | ) | [inline] |
Test is a given unicode character is a letter or number.
Definition at line 291 of file unicode.h.
References CONNECTOR_PUNCTUATION, DECIMAL_DIGIT_NUMBER, get_category(), LETTER_NUMBER, LOWERCASE_LETTER, MODIFIER_LETTER, OTHER_LETTER, OTHER_NUMBER, TITLECASE_LETTER, and UPPERCASE_LETTER.
Referenced by Xapian::check_wordchar(), is_not_wordchar(), and Xapian::QueryParser::Internal::parse_term().
bool Xapian::Unicode::is_whitespace | ( | unsigned | ch | ) | [inline] |
Test is a given unicode character is a whitespace character.
Definition at line 306 of file unicode.h.
References CONTROL, get_category(), LINE_SEPARATOR, PARAGRAPH_SEPARATOR, and SPACE_SEPARATOR.
Referenced by is_not_whitespace().
bool Xapian::Unicode::is_currency | ( | unsigned | ch | ) | [inline] |
Test is a given unicode character is a currency symbol.
Definition at line 316 of file unicode.h.
References CURRENCY_SYMBOL, and get_category().
unsigned Xapian::Unicode::tolower | ( | unsigned | ch | ) | [inline] |
Convert a unicode character to lowercase.
Definition at line 321 of file unicode.h.
References Xapian::Unicode::Internal::get_case_type(), Xapian::Unicode::Internal::get_character_info(), and Xapian::Unicode::Internal::get_delta().
Referenced by AuthorValueRangeProcessor::operator()(), and tolower().
unsigned Xapian::Unicode::toupper | ( | unsigned | ch | ) | [inline] |
Convert a unicode character to uppercase.
Definition at line 330 of file unicode.h.
References Xapian::Unicode::Internal::get_case_type(), Xapian::Unicode::Internal::get_character_info(), and Xapian::Unicode::Internal::get_delta().
Referenced by toupper().
std::string Xapian::Unicode::tolower | ( | const std::string & | term | ) | [inline] |
Convert a UTF-8 std::string to lowercase.
Definition at line 340 of file unicode.h.
References append_utf8(), and tolower().
Referenced by check_table(), Xapian::check_wordchar(), convert_numeric_string(), DEFINE_TESTCASE(), and Xapian::QueryParser::Internal::parse_term().
std::string Xapian::Unicode::toupper | ( | const std::string & | term | ) | [inline] |
Convert a UTF-8 std::string to uppercase.
Definition at line 352 of file unicode.h.
References append_utf8(), and toupper().
Referenced by DEFINE_TESTCASE().