include/xapian/unicode.h File Reference

Unicode and UTF-8 related classes and functions. More...

#include <xapian/visibility.h>
#include <string>

Include dependency graph for unicode.h:

This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Namespaces

namespace  Xapian
namespace  Xapian::Unicode
namespace  Xapian::Unicode::Internal

Classes

class  Xapian::Utf8Iterator
 An iterator which returns unicode character values from a UTF-8 encoded string. More...

Enumerations

enum  Xapian::Unicode::category {
  Xapian::Unicode::UNASSIGNED, Xapian::Unicode::UPPERCASE_LETTER, Xapian::Unicode::LOWERCASE_LETTER, Xapian::Unicode::TITLECASE_LETTER,
  Xapian::Unicode::MODIFIER_LETTER, Xapian::Unicode::OTHER_LETTER, Xapian::Unicode::NON_SPACING_MARK, Xapian::Unicode::ENCLOSING_MARK,
  Xapian::Unicode::COMBINING_SPACING_MARK, Xapian::Unicode::DECIMAL_DIGIT_NUMBER, Xapian::Unicode::LETTER_NUMBER, Xapian::Unicode::OTHER_NUMBER,
  Xapian::Unicode::SPACE_SEPARATOR, Xapian::Unicode::LINE_SEPARATOR, Xapian::Unicode::PARAGRAPH_SEPARATOR, Xapian::Unicode::CONTROL,
  Xapian::Unicode::FORMAT, Xapian::Unicode::PRIVATE_USE, Xapian::Unicode::SURROGATE, Xapian::Unicode::CONNECTOR_PUNCTUATION,
  Xapian::Unicode::DASH_PUNCTUATION, Xapian::Unicode::OPEN_PUNCTUATION, Xapian::Unicode::CLOSE_PUNCTUATION, Xapian::Unicode::INITIAL_QUOTE_PUNCTUATION,
  Xapian::Unicode::FINAL_QUOTE_PUNCTUATION, Xapian::Unicode::OTHER_PUNCTUATION, Xapian::Unicode::MATH_SYMBOL, Xapian::Unicode::CURRENCY_SYMBOL,
  Xapian::Unicode::MODIFIER_SYMBOL, Xapian::Unicode::OTHER_SYMBOL
}
 Each unicode character is in one of these categories. More...

Functions

int Xapian::Unicode::Internal::get_character_info (unsigned ch)
 

For internal use only.

Extract the information about a character from the Unicode character tables.


int Xapian::Unicode::Internal::get_case_type (int info)
 

For internal use only.

Extract how to convert the case of a unicode character from its info.


category Xapian::Unicode::Internal::get_category (int info)
 

For internal use only.

Extract the category of a unicode character from its info.


int Xapian::Unicode::Internal::get_delta (int info)
 

For internal use only.

Extract the delta to use for case conversion of a character from its info.


unsigned Xapian::Unicode::nonascii_to_utf8 (unsigned ch, char *buf)
 Convert a single non-ASCII unicode character to UTF-8.
unsigned Xapian::Unicode::to_utf8 (unsigned ch, char *buf)
 Convert a single unicode character to UTF-8.
void Xapian::Unicode::append_utf8 (std::string &s, unsigned ch)
 Append the UTF-8 representation of a single unicode character to a std::string.
category Xapian::Unicode::get_category (unsigned ch)
 Return the category which a given unicode character falls into.
bool Xapian::Unicode::is_wordchar (unsigned ch)
 Test is a given unicode character is a letter or number.
bool Xapian::Unicode::is_whitespace (unsigned ch)
 Test is a given unicode character is a whitespace character.
bool Xapian::Unicode::is_currency (unsigned ch)
 Test is a given unicode character is a currency symbol.
unsigned Xapian::Unicode::tolower (unsigned ch)
 Convert a unicode character to lowercase.
unsigned Xapian::Unicode::toupper (unsigned ch)
 Convert a unicode character to uppercase.
std::string Xapian::Unicode::tolower (const std::string &term)
 Convert a UTF-8 std::string to lowercase.
std::string Xapian::Unicode::toupper (const std::string &term)
 Convert a UTF-8 std::string to uppercase.


Detailed Description

Unicode and UTF-8 related classes and functions.

Definition in file unicode.h.


Documentation for Xapian (version 1.0.10).
Generated on 24 Dec 2008 by Doxygen 1.5.2.