Table of Contents
Abstract
This chapter provides a general overview of essential MySQL Cluster, NDB API, and MGM API concepts, terminology, and programming constructs.
Abstract
This section introduces the NDB Transaction and Scanning APIs as well as the NDB Management (MGM) API for use in building applications to run on MySQL Cluster. It also discusses the general theory and principles involved in developing such applications.
The NDB API is an object-oriented application programming interface for MySQL Cluster that implements indexes, scans, transactions, and event handling. NDB transactions are ACID-compliant in that they provide a means to group together operations in such a way that they succeed (commit) or fail as a unit (rollback). It is also possible to perform operations in a "no-commit" or deferred mode, to be committed at a later time.
NDB scans are conceptually rather similar to the SQL cursors
implemented in MySQL 5.0 and other common enterprise-level
database management systems. These allow for high-speed row
processing for record retrieval purposes. (MySQL Cluster
naturally supports set processing just as does MySQL in its
non-Cluster distributions. This can be accomplished via the
usual MySQL APIs discussed in the MySQL Manual and elsewhere.)
The NDB API supports both table scans and row scans; the latter
can be performed using either unique or ordered indexes. Event
detection and handling is discussed in
Section 3.5, “The NdbEventOperation
Class”, as well as
Section 6.5, “NDB API Event Handling Example”.
In addition, the NDB API provides object-oriented error-handling facilities in order to provide a means of recovering gracefully from failed operations and other problems. See Section 6.2, “Handling Errors and Retrying Transactions”, for a detailed example.
The NDB API provides a number of classes implementing the
functionality described above. The most important of these
include the Ndb
,
Ndb_cluster_connection
,
NdbTransaction
, and
NdbOperation
classes. These model
(respectively) database connections, cluster connections,
transactions, and operations. These classes and their subclasses
are listed in Chapter 3, NDB
API CLASSES. Error conditions in
the NDB API are handled using NdbError
, a
structure which is described in
Section 4.1, “The NdbError
Structure”.
The MySQL Cluster Management API, also known as the MGM API, is a C-language programming interface intended to provide administrative services for the cluster. These include starting and stopping Cluster nodes, handling Cluster logging, backups, and restoration from backups, as well as various other management tasks. A conceptual overview of MGM and its uses can be found in Chapter 5, THE MGM API.
The MGM API's principal structures model the states of
individual modes (ndb_mgm_node_state
), the
state of the Cluster as a whole
(ndb_mgm_cluster_state
), and management
server response messages (ndb_mgm_reply
). See
Section 5.4, “MGM Structures”, for detailed descirptions
of these.
Abstract
Provides a glossary of terms which are unique to the NDB and MGM APIs, or have a specialised meaning when applied therein.
The following terms are useful to an understanding of MySQL Cluster, the NDB API, or have a specialised meaning when used in one of these contexts. In addition, you may find the MySQL Manual's Cluster Glossary to be useful as well.
Backup: A complete copy of all cluster data, transactions and logs, saved to disk.
Restore: Returning the cluster to a previous state as stored in a backup.
Checkpoint: Generally
speaking, when data is saved to disk, it is said that a
checkpoint has been reached. When working with the
NDB
storage engine, there are two sorts of
checkpoints which work together in order to ensure that a
consistent view of the cluster's data is maintained:
Local Checkpoint (LCP): This is a checkpoint that is specific to a single node; however, LCPs take place for all nodes in the cluster more or less concurrently. An LCP involves saving all of a node's data to disk, and so usually occurs every few minutes, depending upon the amount of data stored by the node.
More detailed information about LCPs and their behaviour can be found in the MySQL Manual, in the sections Defining MySQL Cluster Data Nodes and Configuring Parameters for Local Checkpoints.
Global Checkpoint (GCP): A GCP occurs every few seconds, when transactions for all nodes are synchronised and the REDO log is flushed to disk.
A related term is GCI, which stands for “Global Check ID”. This marks the point in the REDO log where a GCP took place.
Node: A component of MySQL Cluster. 3 node types are supported:
Management (MGM) nodes
SQL nodes (also known as API nodes, particularly when speaking of application other than mysqld that access cluster data)
data nodes (sometimes referred to as “storage nodes”, although this usage is now discouraged)
For more information about these node types, please refer to Section 1.3.3, “Review of MySQL Cluster Concepts”, or to Cluster Process Management in the MySQL Manual.
Node Failure: MySQL Cluster is not solely dependent upon the functioning of any single node making up the cluster, which can continue to run even when one node fails.
Node Restart: The process of restarting a cluster node which has stopped on its own or been stopped deliberately. This can be done for several different reasons, including the following:
Restarting a node which has shut down on its own (when this has occurred, it is known as forced shutdown or node failure; the other cases dicussed here involve manually shutting down the node and restarting it)
To update the node's configuration
As part of a software or hardware upgrade
In order to defragment the node's
DataMemory
Initial Node Restart: The process of starting a cluster node with its filesystem removed. This is sometimes used in the course of software upgrades and in other special circumstances.
System Crash (or System Failure): This can occur when so many cluster nodes have failed that the cluster's state can no longer be guaranteed.
System Restart: The process of restarting the cluster and reinitialising its state from disk logs and checkpoints. This is required after either a planned or an unplanned shutdown of the cluster.
Fragment: Contains a portion
of a database table; in other words, in the
NDB
storage engine, a table is broken up
into and stored as a number of subsets, usually referred to as
fragments. A fragment is sometimes also called a
partition.
Replica: Under the
NDB
storage engine, each table fragment has
number of replicas in order to provide redundancy.
Transporter: A protocol providing data transfer across a network. The NDB API supports 4 different types of transporter connections: TCP/IP (local), TCP/IP (remote), SCI, and SHM. TCP/IP is, of course, the familiar network protocol that underlies HTTP, FTP, and so forth, on the Internet. SCI (Scalable Coherent Interface) is a high-speed protocol used in building multiprocessor systems and parallel-processing applications. SHM stands for Unix-style shared memory segments. For an informal introduction to SCI, see this essay at dolphinics.com.
NDB
: This originally stood for
“Network Database”. It now refers to the storage
engine used by MySQL AB to enable its MySQL Cluster
distributed database.
ACC: Access Manager. Handles hash indexes of primary keys providing speedy access to the records.
TUP: Tuple Manager. This handles storage of tuples (records) and contains the filtering engine used to filter out records and attributes when performing reads and/or updates.
TC: Transaction Coordinator. Handles co-ordination of transactions and timeouts; serves as the interface to the NDB API for indexes and scan operations.
Abstract
This section defines and discusses the high-level architecture of the NDB API, and introduces the NDB classes which are of greatest use and interest to the developer. It also covers most important NDB API concepts, including a review of MySQL Cluster Concepts.
The NDB API is a MySQL Cluster application interface that implements transactions. It consists of the following fundamental classes:
Ndb_cluster_connection
represents a
connection to a cluster.
Ndb
is the main class, and represents a
connection to a database.
NdbDictionary
provides meta-information
about tables and attributes.
NdbTransaction
represents a transaction.
NdbOperation
represents an operation
using a primary key.
NdbScanOperation
represents an operation
performing a full table scan.
NdbIndexOperation
represents an operation
using a unique hash index.
NdbIndexScanOperation
represents an
operation performing a scan using an ordered index.
NdbRecAttr
represents an attribute value.
In addition, the NDB API defines an NdbError
structure, which contains the specification for an error.
It is also possible to receive events triggered when data in the
database is changed. This is accomplished through the
NdbEventOperation
class.
The NDB
event notification API is not
supported prior to MySQL 5.1. (Bug#19719)
For more information about these classes as well as some
additional auxiliary classes not listed here, see
Chapter 3, NDB
API CLASSES.
The main structure of an application program is as follows:
Connect to a cluster using the
Ndb_cluster_connection
object.
Initiate a database connection by constructing and
initialising one or more Ndb
objects.
Identify the tables, columns, and indexes on which you wish
to operate, using NdbDictionary
and one
or more of its subclasses.
Define and execute transactions using the
NdbTransaction
class.
Delete Ndb
objects.
Terminate the connection to the cluster (terminate an
instance of Ndb_cluster_connection
).
The procedure for using transactions is as follows:
Start a transaction (instantiate an
NdbTransaction
object).
Add and define operations associated with the transaction
using instances of one or more of the
NdbOperation
,
NdbScanOperation
,
NdbIndexOperation
, and
NdbIndexScanOperation
classes.
Execute the transaction (call
NdbTransaction::execute()
).
The operation can be of two different types —
Commit
or NoCommit
:
If the operation is of type
NoCommit
, then the application
program requests that the operation portion of a
transaction be executed, but without actually
committing the transaction. Following the execution
of a NoCommit
operation, the
program can continue to define additional
transaction operations for later execution.
NoCommit
operations can also be
rolled back by the application.
If the operation is of type
Commit
, then the transaction is
immediately committed. The transaction must be
closed after it has been committed (even if the
commit fails), and no further operations can be
added to or defined for this transaction.
Synchronous transactions are defined and executed as follows:
Begin (create) the transaction, which is referenced by an
NdbTransaction
object typically created
using Ndb::startTransaction()
. At this
point, the transaction is merely being defined; it is not
yet sent to the NDB kernel.
Define operations and add them to the transaction, using one or more of the following:
NdbTransaction::getNdbOperation()
NdbTransaction::getNdbScanOperation()
NdbTransaction::getNdbIndexOperation()
NdbTransaction::getNdbIndexScanOperation()
along with the appropriate methods of the
respectiveNdbOperation
class (or
possibly one or more of its subclasses). Note that, at
this point, the transaction has still not yet been sent to
the NDB kernel.
Execute the transaction, using the
NdbTransaction::execute()
method.
Close the transaction by calling
Ndb::closeTransaction()
.
For an example of this process, see Section 6.1, “Using Synchronous Transactions”.
To execute several synchronous transactions in parallel, you
can either use multiple Ndb
objects in
several threads, or start multiple application programs.
An NdbTransaction
consists of a list of
operations, each of which is represented by an instance of
NdbOperation
,
NdbScanOperation
,
NdbIndexOperation
, or
NdbIndexScanOperation
(that is, of
NdbOperation
or one of its child classes).
After the operation is created using NdbTransaction::getNdbOperation() or NdbTransaction::getNdbIndexOperation(), it is defined in the following three steps:
Specify the standard operation type using
NdbOperation::readTuple()
.
Specify search conditions using
NdbOperation::equal()
.
Specify attribute actions using
NdbOperation::getValue()
.
Here are two brief examples illustrating this process. For the sake of brevity, we omit error handling.
This first example uses an NdbOperation
:
// 1. Retrieve table object myTable= myDict->getTable("MYTABLENAME"); // 2. Create an NdbOperation on this table myOperation= myTransaction->getNdbOperation(myTable); // 3. Define the operation's type and lock mode myOperation->readTuple(NdbOperation::LM_Read); // 4. Specify search conditions myOperation->equal("ATTR1", i); // 5. Perform attribute retrieval myRecAttr= myOperation->getValue("ATTR2", NULL);
For additional examples of this sort, see Section 6.1, “Using Synchronous Transactions”.
The second example uses an
NdbIndexOperation
:
// 1. Retrieve index object myIndex= myDict->getIndex("MYINDEX", "MYTABLENAME"); // 2. Create myOperation= myTransaction->getNdbIndexOperation(myIndex); // 3. Define type of operation and lock mode myOperation->readTuple(NdbOperation::LM_Read); // 4. Specify Search Conditions myOperation->equal("ATTR1", i); // 5. Attribute Actions myRecAttr = myOperation->getValue("ATTR2", NULL);
Another example of this second type can be found in Section 6.4, “Using Secondary Indexes in Scans”.
We now discuss in somewhat greater detail each step involved in the creation and use of synchronous transactions.
Define single row operation type. The following operation types are supported:
NdbOperation::insertTuple()
:
Inserts a non-existing tuple.
NdbOperation::writeTuple()
:
Updates a tuple if one exists, otherwise inserts a
new tuple.
NdbOperation::updateTuple()
:
Updates an existing tuple.
NdbOperation::deleteTuple()
:
Deletes an existing tuple.
NdbOperation::readTuple()
: Reads
an existing tuple using the specified lock mode.
All of these operations operate on the unique tuple key.
When NdbIndexOperation
is used, then
each of these operations operates on a defined unique
hash index.
If you want to define multiple operations within the
same transaction, then you need to call
NdbTransaction::getNdbOperation()
or
NdbTransaction::getNdbIndexOperation()
for each operation.
Specify Search Conditions.
The search condition is used to select tuples. Search
conditions are set using
NdbOperation::equal()
.
Specify Attribute Actions. Next, it is necessary to determine which attributes should be read or updated. It is important to remember that:
Deletes can neither read nor set values, but only delete them.
Reads can only read values.
Updates can only set values. Normally the attribute is identified by name, but it is also possible to use the attribute's identity to determine the attribute.
NdbOperation::getValue()
returns an
NdbRecAttr
object containing the
value as read. To obtain the actual value, one of two
methods can be used; the application can either
Use its own memory (passed through a pointer
aValue
) to
NdbOperation::getValue()
, or
receive the attribute value in an
NdbRecAttr
object allocated by
the NDB API.
The NdbRecAttr
object is released
when Ndb::closeTransaction()
is
called. For this reason, the application cannot
reference this object following any subsequent call to
Ndb::closeTransaction()
. Attempting
to read data from an NdbRecAttr
object before calling
NdbTransaction::execute()
yields an
undefined result.
Scans are roughly the equivalent of SQL cursors, providing a
means to perform high-speed row processing. A scan can be
performed on either a table (using an
NdbScanOperation
) or an ordered index (by
means of an NdbIndexScanOperation
).
Scan operations have the following characteristics:
They can perform read operations which may be shared, exclusive, or dirty.
They can potentially work with multiple rows.
They can be used to update or delete multiple rows.
They can operate on several nodes in parallel.
After the operation is created using
NdbTransaction::getNdbScanOperation()
or
NdbTransaction::getNdbIndexScanOperation()
,
it is carried out as follows:
Define the standard operation type, using
NdbScanOperation::readTuples()
.
Specify search conditions, using
NdbScanFilter
,
NdbIndexScanOperation::setBound()
, or
both.
Specify attribute actions using
NdbOperation::getValue()
.
Execute the transaction using
NdbTransaction::execute()
.
Traverse the result set by means of successive calls to
NdbScanOperation::nextResult()
.
Here are two brief examples illustrating this process. Once again, in order to keep things relatively short and simple, we forego any error handling.
This first example performs a table scan using an
NdbScanOperation
:
// 1. Retrieve a table object myTable= myDict->getTable("MYTABLENAME"); // 2. Create a scan operation (NdbScanOperation) on this table myOperation= myTransaction->getNdbScanOperation(myTable); // 3. Define the operation's type and lock mode myOperation->readTuples(NdbOperation::LM_Read); // 4. Specify search conditions NdbScanFilter sf(myOperation); sf.begin(NdbScanFilter::OR); sf.eq(0, i); // Return rows with column 0 equal to i or sf.eq(1, i+1); // column 1 equal to (i+1) sf.end(); // 5. Retrieve attributes myRecAttr= myOperation->getValue("ATTR2", NULL);
The second example uses an NdbIndexScanOperation to perform an index scan:
// 1. Retrieve index object myIndex= myDict->getIndex("MYORDEREDINDEX", "MYTABLENAME"); // 2. Create an operation (NdbIndexScanOperation object) myOperation= myTransaction->getNdbIndexScanOperation(myIndex); // 3. Define type of operation and lock mode myOperation->readTuples(NdbOperation::LM_Read); // 4. Specify search conditions // All rows with ATTR1 between i and (i+1) myOperation->setBound("ATTR1", NdbIndexScanOperation::BoundGE, i); myOperation->setBound("ATTR1", NdbIndexScanOperation::BoundLE, i+1); // 5. Retrieve attributes myRecAttr = MyOperation->getValue("ATTR2", NULL);
Some additional discussion of each step required to perform a scan follows:
Define Scan Operation Type.
It is important to remember that only a single
operation is supported for each scan operation
(NdbScanOperation::readTuples()
or
NdbIndexScanOperation::readTuples()
).
If you want to define multiple scan operations within
the same transaction, then you need to call
NdbTransaction::getNdbScanOperation()
or
NdbTransaction::getNdbIndexScanOperation()
separately for each operation.
Specify Search Conditions.
The search condition is used to select tuples. If no
search condition is specified, the scan will return
all rows in the table. The search condition can be an
NdbScanFilter
(which can be used on
both NdbScanOperation
and
NdbIndexScanOperation
) or bounds
(which can be used only on index scans - see
NdbIndexScanOperation::setBound()
).
An index scan can use both
NdbScanFilter
and bounds.
When NdbScanFilter is used, each row is examined, whether or not it is actually returned. However, when using bounds, only rows within the bounds will be examined.
Specify Attribute Actions.
Next, it is necessary to define which attributes
should be read. As with transaction attributes, scan
attributes are defined by name, but it is also
possible to use the attributes' identities to define
attributes as well. As discussed elsewhere in this
document (see
Section 1.3.2.2, “Synchronous Transactions”),
the value read is returned by the
NdbOperation::getValue()
method as
an NdbRecAttr
object.
Scanning can also be used to update or delete rows. This is performed by
Scanning with exclusive locks using
NdbOperation::LM_Exclusive
.
(When iterating through the result
set:) For each row, optionally calling either
NdbScanOperation::updateCurrentTuple()
or
NdbScanOperation::deleteCurrentTuple()
.
(If performing
NdbScanOperation::updateCurrentTuple()
:)
Setting new values for records simply by using
NdbOperation::setValue()
.
NdbOperation::equal()
should not be
called in such cases, as the primary key is retrieved
from the scan.
The update or delete is not actually performed until the
next call to NdbTransaction::execute()
is made, just as with single row operations.
NdbTransaction::execute()
also must be
called before any locks are released; for more
information, see
Section 1.3.2.3.4, “Lock Handling with Scans”.
Features Specific to Index Scans.
When performing an index scan, it is possible to scan only
a subset of a table using
NdbIndexScanOperation::setBound()
. In
addition, result sets can be sorted in either ascending or
descending order, using
NdbIndexScanOperation::readTuples()
.
Note that rows are returned unordered by default unless
sorted
is set to
true
. It is also important to note
that, when using
NdbIndexScanOperation::BoundEQ()
on a
partition key, only fragments containing rows will
actually be scanned. Finally, when performing a sorted
scan, any value passed as the
NdbIndexScanOperation::readTuples()
method's parallel
argument will
be ignored and maximum parallelism will be used instead.
In other words, all fragments which it is possible to scan
are scanned simultaneously and in parallel in such cases.
Performing scans on either a table or an index has the
potential to return a great many records; however, Ndb locks
only a predetermined number of rows per fragment at a time.
The number of rows locked per fragment is controlled by the
batch parameter passed to
NdbScanOperation::readTuples()
.
In order to allow the application to handle how locks are
released, NdbScanOperation::nextResult()
has a Boolean parameter
fetchAllowed
. If
NdbScanOperation::nextResult()
is called
with fetchAllowed
equal to
false
, then no locks may be released as
result of the function call. Otherwise the locks for the
current batch may be released.
This next example shows a scan delete that handles locks in an efficient manner. For the sake of brevity, we omit error-handling.
int check; // Outer loop for each batch of rows while((check = MyScanOperation->nextResult(true)) == 0) { do { // Inner loop for each row within the batch MyScanOperation->deleteCurrentTuple(); } while((check = MyScanOperation->nextResult(false)) == 0); // When there are no more rows in the batch, execute all defined deletes MyTransaction->execute(NoCommit); }
For a more complete example of a scan, see Section 6.3, “Basic Scanning Example”.
Errors can occur either when operations making up a
transaction are being defined, or when the transaction is
actually being executed. Catching and handling either sort
of error requires testing the value returned by
NdbTransaction::execute()
, and then, if
an error is indicated (that is, if this value is equal to
-1
), using the following two methods in
order to identify the error's type and location:
NdbTransaction::getNdbErrorOperation()
returns a reference to the operation causing the most
recent error.
NdbTransaction::getNdbErrorLine()
yields the method number of the erroneous method in the
operation, starting with 1
.
This short example illustrates how to detect an error and to use these two methods to identify it:
theTransaction = theNdb->startTransaction(); theOperation = theTransaction->getNdbOperation("TEST_TABLE"); if(theOperation == NULL) goto error; theOperation->readTuple(NdbOperation::LM_Read); theOperation->setValue("ATTR_1", at1); theOperation->setValue("ATTR_2", at1); // Error occurs here theOperation->setValue("ATTR_3", at1); theOperation->setValue("ATTR_4", at1); if(theTransaction->execute(Commit) == -1) { errorLine = theTransaction->getNdbErrorLine(); errorOperation = theTransaction->getNdbErrorOperation(); }
Here, errorLine
is 3
,
as the error occurred in the third method called on the
NdbOperation
object (in this case,
theOperation
). If the result of
NdbTransaction::getNdbErrorLine()
is
0
, then the error occurred when the
operations were executed. In this example,
errorOperation
is a pointer to the object
theOperation
. The
NdbTransaction::getNdbError()
method
returns an NdbError
object providing
information about the error.
Transactions are not automatically
closed when an error occurs. You must call
Ndb::closeTransaction()
or
NdbTransaction::close()
to close the
transaction.
See Section 3.1.1.8, “Ndb::closeTransaction()
”, and
Section 3.9.2.7, “NdbTransaction::close()
”.
One recommended way to handle a transaction failure (that is, when an error is reported) is as shown here:
Roll back the transaction by calling
NdbTransaction::execute()
with a
special ExecType
value for the
type
parameter.
See Section 3.9.2.5, “NdbTransaction::execute()
” and
Section 3.9.1.3, “The NdbTransaction::ExecType
Type”, for
more information about how this is done.
Close the transaction by calling
NdbTransaction::closeTransaction()
.
If the error was temporary, attempt to restart the transaction.
Several errors can occur when a transaction contains
multiple operations which are simultaneously executed. In
this case the application must go through all operations and
query each of their NdbError
objects to
find out what really happened.
Errors can occur even when a commit is reported as
successful. In order to handle such situations, the NDB
API provides an additional
NdbTransaction::commitStatus()
method
to check the transaction's commit status.
Abstract
This seciton covers the NDB Kernel, and discusses MySQL Cluster transaction handling and transaction coordinators. It also describes NDB record structures and concurrency issues.
The NDB Kernel is the collection of data nodes belonging to a MySQL Cluster. The application programmer can for most purposes view the set of all storage nodes as a single entity. Each data node is made up of three main components:
When an application executes a transaction, it connects to one transaction coordinator on one data node. Usually, the programmer does not need to specify which TC should be used, but in some cases where performance is important, the programmer can provide “hints” to use a certain TC. (If the node with the desired transaction coordinator is down, then another TC will automatically take its place.)
Each data node has an ACC and a TUP which store the indexes and data portions of the database table fragment. Even though a single TC is responsible for the transaction, several ACCs and TUPs on other data nodes might be involved in that transaction's execution.
The default method is to select the transaction coordinator (TC) determined to be the "nearest" data node, using a heuristic for proximity based on the type of transporter connection. In order of nearest to most distant, these are:
SCI
SHM
TCP/IP (localhost)
TCP/IP (remote host)
If there are several connections available with the same proximity, one is selected for each transaction in a round-robin fashion. Optionally, you may set the method for TC selection to round-robin mode, where each new set of transactions is placed on the next data node. The pool of connections from which this selection is made consists of all available connections.
As noted in Section 1.3.3, “Review of MySQL Cluster Concepts”, the
application programmer can provide hints to the NDB API as to
which transaction coordinator should be uses. This is done by
providing a table and a partition key (usually the primary
key). If the primary key as the partition key, then the
transaction is placed on the node where the primary replica of
that record resides. Note that this is only a hint; the system
can be reconfigured at any time, in which case the NDB API
chooses a transaction coordinator without using the hint. For
more information, see
Section 3.4.2.2.16, “Column::getPartitionKey()
”, and
Section 3.1.1.7, “Ndb::startTransaction()
”. The application
programmer can specify the partition key from SQL by using
this construct:
CREATE TABLE ... ENGINE=NDB PARTITION BY KEY (attribute_list
);
For additional information, see
Partitioning
and in particular
KEY
Partitioning in the MySQL Manual.
The NDB Cluster
storage engine used by
MySQL Cluster is a relational database engine storing records
in tables as with other relational database systems. Table
rows represent records as tuples of relational data. When a
new table is created, its attribute schema is specified for
the table as a whole, and thus each table row has the same
structure. Again, this is typical of relational databases, and
NDB
is no different in this regard.
Primary Keys. Each record has from 1 up to 32 attributes which belong to the primary key of the table.
Transactions. Transactions are committed first to main memory, and then to disk, after a global checkpoint (GCP) is issued. Since all data are (in most NDB Cluster configurations) synchronously replicated and stored on multiple data nodes, the system can handle processor failures without loss of data. However, in the case of a system-wide failure, all transactions (committed or not) occurring since the most recent GCP are lost.
Concurrency Control.
NDB Cluster
uses pessimistic
concurrency control based on locking. If a
requested lock (implicit and depending on database
operation) cannot be attained within a specified time, then
a timeout error results.
Concurrent transactions as requested by parallel application programs and thread-based applications can sometimes deadlock when they try to access the same information simultaneously. Thus, applications need to be written in a manner such that timeout errors occurring due to such deadlocks are handled gracefully. This generally means that the transaction encountering a timeout should be rolled back and restarted.
Hints and Performance. Placing the transaction coordinator in close proximity to the actual data used in the transaction can in many cases improve performance significantly. This is particularly true for systems using TCP/IP. For example, a Solaris system using a single 500 MHz processor has a cost model for TCP/IP communication which can be represented by the formula
[30 microseconds] + ([100 nanoseconds] * [number of bytes])
This means that if we can ensure that we use “popular” links we increase buffering and thus drastically reduce the costs of communication. The same system using SCI has a different cost model:
[5 microseconds] + ([10 nanoseconds] * [number of bytes])
This means that the efficiency of an SCI system is much less dependent on selection of transaction coordinators. Typically, TCP/IP systems spend 30 to 60% of their working time on communication, whereas for SCI systems this figure is in the range of 5 to 10%. Thus, employing SCI for data transport means that less effort from the NDB API programmer is required and greater scalability can be achieved, even for applications using data from many different parts of the database.
A simple example would be an application that uses many simple
updates where a transaction needs to update one record. This
record has a 32-bit primary key which also serves as the
partitioning key. Then the keyData
is used
as the address of the integer of the primary key and
keyLen
is 4
.
Abstract
Discusses the mechanics of transaction handling and transmission in MySQL Cluster and the NDB API, and the objects used to implement these.
When transactions are sent using
NdbTransaction::execute()
, they are not
immediately transferred to the NDB Kernel. Instead, transactions
are kept in a special send list (buffer) in the
Ndb
object to which they belong. The adaptive
send algorithm decides when transactions should actually be
transferred to the NDB kernel.
The NDB API is designed as a multi-threaded interface, and so it
is often desirable to transfer database operations from more
than one thread at a time. The NDB API keeps track of which
Ndb
objects are active in transferring
information to the NDB kernel and the expected number of threads
to interact with the NDB kernel. Note that a given instance of
Ndb
should be used in at most one thread;
different threads should not share the same
Ndb
object.
There are four conditions leading to the transfer of database
operations from Ndb
object buffers to the NDB
kernel:
The NDB Transporter (TCP/IP, SCI, or shared memory) decides
that a buffer is full and sends it off. The buffer size is
implementation-dependent and may change between MySQL
Cluster releases. When TCP/IP is the transporter, the buffer
size is usually around 64 KB. Since each
Ndb
object provides a single buffer per
data node, the notion of a “full” buffer is
local to each data node.
The accumulation of statistical data on transferred information may force sending of buffers to all storage nodes (that is, when all the buffers become full).
Every 10 ms, a special transmission thread checks whether or not any send activity has occurred. If not, then the thread will force transmission to all nodes. This means that 20 ms is the maximum amount of time that database operations are kept waiting before being dispatched. A 10-millisecond limit is likely in future releases of MySQL Cluster; checks more frequent than this require additional support from the operating system.
For methods that are affected by the adaptive send algorithm
(such as NdbTransaction::execute()
),
there is a force
parameter that
overrides its default behaviour in this regard and forces
immediate transmission to all nodes. See the individual NDB
API class listings for more information.
The conditions listed above are subject to change in future releases of MySQL Cluster.