Xapian::Unicode Namespace Reference


Namespaces

namespace  Internal

Enumerations

enum  category {
  UNASSIGNED, UPPERCASE_LETTER, LOWERCASE_LETTER, TITLECASE_LETTER,
  MODIFIER_LETTER, OTHER_LETTER, NON_SPACING_MARK, ENCLOSING_MARK,
  COMBINING_SPACING_MARK, DECIMAL_DIGIT_NUMBER, LETTER_NUMBER, OTHER_NUMBER,
  SPACE_SEPARATOR, LINE_SEPARATOR, PARAGRAPH_SEPARATOR, CONTROL,
  FORMAT, PRIVATE_USE, SURROGATE, CONNECTOR_PUNCTUATION,
  DASH_PUNCTUATION, OPEN_PUNCTUATION, CLOSE_PUNCTUATION, INITIAL_QUOTE_PUNCTUATION,
  FINAL_QUOTE_PUNCTUATION, OTHER_PUNCTUATION, MATH_SYMBOL, CURRENCY_SYMBOL,
  MODIFIER_SYMBOL, OTHER_SYMBOL
}
 Each unicode character is in one of these categories. More...

Functions

unsigned nonascii_to_utf8 (unsigned ch, char *buf)
 Convert a single non-ASCII unicode character to UTF-8.
unsigned to_utf8 (unsigned ch, char *buf)
 Convert a single unicode character to UTF-8.
void append_utf8 (std::string &s, unsigned ch)
 Append the UTF-8 representation of a single unicode character to a std::string.
category get_category (unsigned ch)
 Return the category which a given unicode character falls into.
bool is_wordchar (unsigned ch)
 Test is a given unicode character is a letter or number.
bool is_whitespace (unsigned ch)
 Test is a given unicode character is a whitespace character.
bool is_currency (unsigned ch)
 Test is a given unicode character is a currency symbol.
unsigned tolower (unsigned ch)
 Convert a unicode character to lowercase.
unsigned toupper (unsigned ch)
 Convert a unicode character to uppercase.
std::string tolower (const std::string &term)
 Convert a UTF-8 std::string to lowercase.
std::string toupper (const std::string &term)
 Convert a UTF-8 std::string to uppercase.


Enumeration Type Documentation

enum Xapian::Unicode::category

Each unicode character is in one of these categories.

Enumerator:
UNASSIGNED 
UPPERCASE_LETTER 
LOWERCASE_LETTER 
TITLECASE_LETTER 
MODIFIER_LETTER 
OTHER_LETTER 
NON_SPACING_MARK 
ENCLOSING_MARK 
COMBINING_SPACING_MARK 
DECIMAL_DIGIT_NUMBER 
LETTER_NUMBER 
OTHER_NUMBER 
SPACE_SEPARATOR 
LINE_SEPARATOR 
PARAGRAPH_SEPARATOR 
CONTROL 
FORMAT 
PRIVATE_USE 
SURROGATE 
CONNECTOR_PUNCTUATION 
DASH_PUNCTUATION 
OPEN_PUNCTUATION 
CLOSE_PUNCTUATION 
INITIAL_QUOTE_PUNCTUATION 
FINAL_QUOTE_PUNCTUATION 
OTHER_PUNCTUATION 
MATH_SYMBOL 
CURRENCY_SYMBOL 
MODIFIER_SYMBOL 
OTHER_SYMBOL 

Definition at line 185 of file unicode.h.


Function Documentation

unsigned Xapian::Unicode::nonascii_to_utf8 ( unsigned  ch,
char *  buf 
)

Convert a single non-ASCII unicode character to UTF-8.

This is intended mainly as a helper method for to_utf8().

The character ch (which must be > 128) is written to the buffer buf and the length of the resultant UTF-8 character is returned.

NB buf must have space for (at least) 4 bytes.

Definition at line 34 of file utf8itor.cc.

Referenced by to_utf8().

unsigned Xapian::Unicode::to_utf8 ( unsigned  ch,
char *  buf 
) [inline]

Convert a single unicode character to UTF-8.

The character ch is written to the buffer buf and the length of the resultant UTF-8 character is returned.

NB buf must have space for (at least) 4 bytes.

Definition at line 267 of file unicode.h.

References nonascii_to_utf8().

Referenced by append_utf8().

void Xapian::Unicode::append_utf8 ( std::string &  s,
unsigned  ch 
) [inline]

Append the UTF-8 representation of a single unicode character to a std::string.

Definition at line 278 of file unicode.h.

References to_utf8().

Referenced by Xapian::QueryParser::Internal::parse_term(), tolower(), and toupper().

category Xapian::Unicode::get_category ( unsigned  ch  )  [inline]

Return the category which a given unicode character falls into.

Definition at line 284 of file unicode.h.

References Xapian::Unicode::Internal::get_character_info(), and UNASSIGNED.

Referenced by DEFINE_TESTCASE(), is_currency(), Xapian::is_digit(), is_digit(), is_whitespace(), is_wordchar(), Xapian::should_stem(), and should_stem().

bool Xapian::Unicode::is_wordchar ( unsigned  ch  )  [inline]

Test is a given unicode character is a letter or number.

Definition at line 291 of file unicode.h.

References CONNECTOR_PUNCTUATION, DECIMAL_DIGIT_NUMBER, get_category(), LETTER_NUMBER, LOWERCASE_LETTER, MODIFIER_LETTER, OTHER_LETTER, OTHER_NUMBER, TITLECASE_LETTER, and UPPERCASE_LETTER.

Referenced by Xapian::check_wordchar(), is_not_wordchar(), and Xapian::QueryParser::Internal::parse_term().

bool Xapian::Unicode::is_whitespace ( unsigned  ch  )  [inline]

Test is a given unicode character is a whitespace character.

Definition at line 306 of file unicode.h.

References CONTROL, get_category(), LINE_SEPARATOR, PARAGRAPH_SEPARATOR, and SPACE_SEPARATOR.

Referenced by is_not_whitespace().

bool Xapian::Unicode::is_currency ( unsigned  ch  )  [inline]

Test is a given unicode character is a currency symbol.

Definition at line 316 of file unicode.h.

References CURRENCY_SYMBOL, and get_category().

unsigned Xapian::Unicode::tolower ( unsigned  ch  )  [inline]

Convert a unicode character to lowercase.

Definition at line 321 of file unicode.h.

References Xapian::Unicode::Internal::get_case_type(), Xapian::Unicode::Internal::get_character_info(), and Xapian::Unicode::Internal::get_delta().

Referenced by AuthorValueRangeProcessor::operator()(), and tolower().

unsigned Xapian::Unicode::toupper ( unsigned  ch  )  [inline]

Convert a unicode character to uppercase.

Definition at line 330 of file unicode.h.

References Xapian::Unicode::Internal::get_case_type(), Xapian::Unicode::Internal::get_character_info(), and Xapian::Unicode::Internal::get_delta().

Referenced by toupper().

std::string Xapian::Unicode::tolower ( const std::string &  term  )  [inline]

Convert a UTF-8 std::string to lowercase.

Definition at line 340 of file unicode.h.

References append_utf8(), and tolower().

Referenced by check_table(), Xapian::check_wordchar(), convert_numeric_string(), DEFINE_TESTCASE(), and Xapian::QueryParser::Internal::parse_term().

std::string Xapian::Unicode::toupper ( const std::string &  term  )  [inline]

Convert a UTF-8 std::string to uppercase.

Definition at line 352 of file unicode.h.

References append_utf8(), and toupper().

Referenced by DEFINE_TESTCASE().


Documentation for Xapian (version 1.0.10).
Generated on 24 Dec 2008 by Doxygen 1.5.2.