Main Page | Modules | Class List | Directories | File List | Class Members | File Members | Related Pages

Charset conversion
[LibTDS API]

Convert between different charsets. More...

Defines

#define CHUNK_ALLOC   4

Functions

static void _iconv_close (iconv_t *cd)
static const char * collate2charset (int sql_collate, int lcid)
static int lookup_canonic (const CHARACTER_SET_ALIAS aliases[], const char *charset_name)
static int skip_one_input_sequence (iconv_t cd, const TDS_ENCODING *charset, const char **input, size_t *input_size)
 Move the input sequence pointer to the next valid position.
void tds7_srv_charset_changed (TDSSOCKET *tds, int sql_collate, int lcid)
static int tds_canonical_charset (const char *charset_name)
 Determine canonical iconv character set.
const char * tds_canonical_charset_name (const char *charset_name)
 Determine canonical iconv character set name.
size_t tds_iconv (TDSSOCKET *tds, const TDSICONV *conv, TDS_ICONV_DIRECTION io, const char **inbuf, size_t *inbytesleft, char **outbuf, size_t *outbytesleft)
 Wrapper around iconv(3).
void tds_iconv_close (TDSSOCKET *tds)
size_t tds_iconv_fread (iconv_t cd, FILE *stream, size_t field_len, size_t term_len, char *outbuf, size_t *outbytesleft)
 Read a data file, passing the data through iconv().
void tds_iconv_free (TDSSOCKET *tds)
TDSICONV * tds_iconv_from_collate (TDSSOCKET *tds, int sql_collate, int lcid)
 Get iconv information from a LCID (to support different column encoding under MSSQL2K).
static TDSICONV * tds_iconv_get_info (TDSSOCKET *tds, const char *canonic_charset)
 Get a iconv info structure, allocate and initialize if needed.
static void tds_iconv_info_close (TDSICONV *char_conv)
static int tds_iconv_info_init (TDSICONV *char_conv, const char *client_name, const char *server_name)
 Open iconv descriptors to convert between character sets (both directions).
void tds_iconv_open (TDSSOCKET *tds, const char *charset)
void tds_srv_charset_changed (TDSSOCKET *tds, const char *charset)
const char * tds_sybase_charset_name (const char *charset_name)
 Determine the name Sybase uses for a character set, given a canonical iconv name.
size_t tds_sys_iconv (iconv_t cd, const char **inbuf, size_t *inbytesleft, char **outbuf, size_t *outbytesleft)
int tds_sys_iconv_close (iconv_t cd)
iconv_t tds_sys_iconv_open (const char *tocode, const char *fromcode)
 Inputs are FreeTDS canonical names, no other.

Detailed Description

Convert between different charsets.

Set up the initial iconv conversion descriptors. When the socket is allocated, three TDSICONV structures are attached to iconv. They have fixed meanings:

Other designs that use less data are possible, but these three conversion needs are very often needed. By reserving them, we avoid searching the array for our most common purposes.

To solve different iconv names and portability problems FreeTDS maintains a list of aliases each charset.

First we discover the names of our minimum required charsets (UTF-8, ISO8859-1 and UCS2). Later, as and when it's needed, we try to discover others.

There is one list of canonic names (GNU iconv names) and two sets of aliases (one for other iconv implementations and another for Sybase). For every canonic charset name we cache the iconv name found during discovery.


Function Documentation

static int skip_one_input_sequence iconv_t  cd,
const TDS_ENCODING charset,
const char **  input,
size_t *  input_size
[static]
 

Move the input sequence pointer to the next valid position.

Used when an input character cannot be converted.

Returns:
number of bytes to skip.

static int tds_canonical_charset const char *  charset_name  )  [static]
 

Determine canonical iconv character set.

Returns:
canonical position, or -1 if lookup failed.
Remarks:
Returned name can be used in bytes_per_char(), above.

const char* tds_canonical_charset_name const char *  charset_name  ) 
 

Determine canonical iconv character set name.

Returns:
canonical name, or NULL if lookup failed.
Remarks:
Returned name can be used in bytes_per_char(), above.

size_t tds_iconv TDSSOCKET tds,
const TDSICONV *  conv,
TDS_ICONV_DIRECTION  io,
const char **  inbuf,
size_t *  inbytesleft,
char **  outbuf,
size_t *  outbytesleft
 

Wrapper around iconv(3).

Same parameters, with slightly different behavior.

Parameters:
tds state information for the socket and the TDS protocol
io Enumerated value indicating whether the data are being sent to or received from the server.
conv information about the encodings involved, including the iconv(3) conversion descriptors.
inbuf address of pointer to the input buffer of data to be converted.
inbytesleft address of count of bytes in inbuf.
outbuf address of pointer to the output buffer.
outbytesleft address of count of bytes in outbuf.
Return values:
number of irreversible conversions performed. -1 on error, see iconv(3) documentation for a description of the possible values of errno.
Remarks:
Unlike iconv(3), none of the arguments can be nor point to NULL. Like iconv(3), all pointers will be updated. Success is signified by a nonnegative return code and *inbytesleft == 0. If the conversion descriptor in iconv is -1 or NULL, inbuf is copied to outbuf, and all parameters updated accordingly.
If a character in inbuf cannot be converted because no such cbaracter exists in the outbuf character set, we emit messages similar to the ones Sybase emits when it fails such a conversion. The message varies depending on the direction of the data. On a read error, we emit Msg 2403, Severity 16 (EX_INFO): "WARNING! Some character(s) could not be converted into client's character set. Unconverted bytes were changed to question marks ('?')." On a write error we emit Msg 2402, Severity 16 (EX_USER): "Error converting client characters into server's character set. Some character(s) could not be converted." and return an error code. Client libraries relying on this routine should reflect an error back to the application.

Todo:
Check for variable multibyte non-UTF-8 input character set.

Todo:
Use more robust error message generation.

Todo:
For reads, cope with outbuf encodings that don't have the equivalent of an ASCII '?'.

Todo:
Support alternative to '?' for the replacement character.

size_t tds_iconv_fread iconv_t  cd,
FILE *  stream,
size_t  field_len,
size_t  term_len,
char *  outbuf,
size_t *  outbytesleft
 

Read a data file, passing the data through iconv().

Returns:
Count of bytes either not read, or read but not converted. Returns zero on success.

static int tds_iconv_info_init TDSICONV *  char_conv,
const char *  client_name,
const char *  server_name
[static]
 

Open iconv descriptors to convert between character sets (both directions).

1. Look up the canonical names of the character sets. 2. Look up their widths. 3. Ask iconv to open a conversion descriptor. 4. Fail if any of the above offer any resistance.

Remarks:
The charset names written to iconv will be the canonical names, not necessarily the names passed in.

const char* tds_sybase_charset_name const char *  charset_name  ) 
 

Determine the name Sybase uses for a character set, given a canonical iconv name.

Returns:
Sybase name, or NULL if lookup failed.
Remarks:
Returned name can be sent to Sybase a server.

iconv_t tds_sys_iconv_open const char *  tocode,
const char *  fromcode
 

Inputs are FreeTDS canonical names, no other.

No alias list is consulted.


Generated on Wed May 7 19:22:11 2008 for FreeTDS API by  doxygen 1.4.1