2.4. Transactions

Transactions are a fundamental concept of all database systems. The essential point of a transaction is that it bundles multiple steps into a single, all-or-nothing operation. The intermediate states between the steps are not visible to other concurrent transactions, and if some failure occurs that prevents the transaction from completing, then none of the steps affect the database at all.

For example, suppose that we want to do the following:

Promote Smith to analyst with a new salary of 3550.00.
Update his last jobhist record (presumed to be the one where enddate is still null), such that the enddate is set to the day prior to his becoming an analyst.
Record his promotion to analyst with a new salary in the jobhist table.

The SQL commands for this might look like

BEGIN TRANSACTION;
UPDATE emp SET
    job = 'ANALYST',
    sal = '3550.00'
    WHERE empno = 7369;
UPDATE jobhist SET
    enddate = '19-MAR-85'
    WHERE empno = 7369
      AND enddate IS NULL;
INSERT INTO jobhist (empno, startdate, job, sal, deptno, chgdesc)
    VALUES (7369, '20-MAR-85', 'ANALYST', 3550.00, 20,
    'Promoted to Analyst');
COMMIT;

The details of these commands are not important here; the important point is that there are several separate operations involved in accomplishing this rather simple operation. We need a guarantee that if something goes wrong partway through any operation, none of the steps executed so far will take effect. Grouping the updates into a transaction gives us this guarantee. A transaction is said to be atomic : from the point of view of other transactions, it either happens completely or not at all.

We also want a guarantee that once a transaction is completed and acknowledged by the database system, it has indeed been permanently recorded and won't be lost even if a crash ensues shortly thereafter. For example, if we are changing an employee's designation and salary, we do not want any chance that the changes are partially reflected by these changes only being made to the emp table, but by not being present in the jobhist table. A transactional database guarantees that all the updates made by a transaction are logged in permanent storage (i.e., on disk) before the transaction is reported complete.

Another important property of transactional databases is closely related to the notion of atomic updates: when multiple transactions are running concurrently, each one should not be able to see the incomplete changes made by others. So transactions must be all-or-nothing not only in terms of their permanent effect on the database, but also in terms of their visibility as they happen. The updates made so far by an open transaction are invisible to other transactions until the transaction completes, whereupon all the updates become visible simultaneously.

In EnterpriseDB, a transaction is set up by surrounding the SQL commands of the transaction with BEGIN and COMMIT commands. So our salaries transaction would actually look like

BEGIN TRANSACTION;
UPDATE emp SET
    job = 'ANALYST',
    sal = '3550.00'
    WHERE empno = 7369;
-- etc etc
COMMIT;

If, partway through the transaction, we decide we do not want to commit, we can issue the command ROLLBACK instead of COMMIT, and all our updates so far will be canceled.

EnterpriseDB actually treats every SQL statement as being executed within a transaction. If you do not issue a BEGIN command, then each individual statement has an implicit BEGIN and (if successful) COMMIT wrapped around it. A group of statements surrounded by BEGIN and COMMIT is called a transaction block.

Note: Some client libraries issue BEGIN and COMMIT commands automatically, so that you may get the effect of transaction blocks without asking. Check the documentation for the interface you are using.

It's possible to control the statements in a transaction in a more granular fashion through the use of savepoints. Savepoints allow you to selectively discard parts of the transaction, while committing the rest. After defining a savepoint with SAVEPOINT, you can if needed roll back to the savepoint with ROLLBACK TO. All the transaction's database changes between defining the savepoint and rolling back to it are discarded, but changes earlier than the savepoint are kept.

After rolling back to a savepoint, it continues to be defined, so you can roll back to it several times. Conversely, if you are sure you won't need to roll back to a particular savepoint again, it can be released, so the system can free some resources. Keep in mind that either releasing or rolling back to a savepoint will automatically release all savepoints that were defined after it.

All this is happening within the transaction block, so none of it is visible to other database sessions. When and if you commit the transaction block, the committed actions become visible as a unit to other sessions, while the rolled-back actions never become visible at all.

Remembering Smith's example, and the possible problems we can encounter with that situation, we can resolve those issues by making use of savepoints as shown below:

BEGIN TRANSACTION;
UPDATE emp SET
    job = 'ANALYST',
    sal = '3550.00'
    WHERE empno = 7369;
SAVEPOINT update_emp;
UPDATE jobhist SET
    enddate = '19-MAR-85'
    WHERE empno = 7469
      AND enddate IS NULL;
-- oops ... we updated the wrong employee ... it was supposed to be empno 7369 not 7469
ROLLBACK TO update_emp;
UPDATE jobhist SET
    enddate = '19-MAR-85'
    WHERE empno = 7369
      AND enddate IS NULL;
INSERT INTO jobhist (empno, startdate, job, sal, deptno, chgdesc)
    VALUES (7369, '20-MAR-85', 'ANALYST', 3550.00, 20,
    'Promoted to Analyst');
COMMIT;
END;

This example is oversimplified, but there's a lot of control to be had over a transaction block through the use of savepoints. Moreover, ROLLBACK TO is the only way to regain control of a transaction block that was put in aborted state by the system due to an error, short of rolling it back completely and starting again.