Howto: How to employ recovery in JOnAS

The content of this guide is the following:

  1. Transaction Recovery Overview
  2. JOnAS Transaction Recovery
  3. Implications of a Non-Conforming Resource Manager
  4. JOnAS-JOTM Transaction Recovery Configuration
  5. XA DataSource Configuration
  6. Monitoring Executing Transactions
  7. JOnAS Administrative Recovery
  8. JOnAS Administrative Heuristic Recovery

Transaction Recovery Overview

Applications and application platforms, such as JOnAS, employ transactions in order to guarantee the atomicity of operations. When the application is interacting with a single data source; e.g., database, then the use of transactions often appears unnecessary since the success of the unit of work depends only on the successful persistence of the work done by that single database.

However, whenever a single unit of work employs multiple data sources, then the management of transactional behavior becomes non-trivial. It is imperative that none of the data sources commits updates unless all of the data sources commit. Solving this problem has resulted in the adoption of the 2-Phase commit protocol. More specifically, J2EE has adopted a variation of 2-Phase XA commitment protocol from the X/Open Group.

Since any database may elect to rollback when asked to commit, it is necessary to perform commitment in multiple phases. First, all databases are asked to "prepare" to commit. Each database verifies that it is capable of persisting the data and either votes to commit or votes to rollback. After a vote to commit, the database delays actual commitment until all of the other databases have similarly voted. For this database, the transaction is in an in-doubt state. If all databases involved with the transaction agree to commit, then a second "phase" is performed where each is requested to commit.

The role of the transaction manager is to keep track of all data sources involved with a transaction. If multiple databases are involved, then the transaction manager will perform the 2-Phase commitment protocol.

Many databases have not completely implemented XA 2-Phase commitment. These typically do not provide a prepare() method or if they do, the method does nothing but returns a affirmative status. Many either do not provide a recover() method or simply return a NULL list of Xid's no matter what the circumstances.

A fully compliant resource manager will only resolve the in-doubt state when informed of the results of the voting by the transaction manager. Resource managers may implement "heuristic decisions" and elect to either COMMIT or ROLLBACK in-doubt transactions based on local information. However, when they do, the protocol requires that they report the use of the heuristic and its result to the controlling transaction manager during the next attempt to recover. Few resource managers are fully compliant.

If there is a loss of communication between the resource manager and the controlling transaction manager, then the protocol is completed by re-establishing the communication and having the transaction manager query the resource manager with the recover() call. The resource manager is to respond with the entire list of transaction which are in-doubt. Each transaction identifier on this list contains information which allows the transaction manager to determine if the associated transaction branch is being managed by it.

Transactions are either ignored, committed, or rolled back, based on the information in the transaction identifier and the transaction log of the transaction manager. During COMMIT and ROLLBACK calls, the resource manager must report any heuristic decisions taken on behalf of the associated transactions to the transaction manager.

If there is a platform or software failure of the resource manager, then it is expected to rebuild the in-doubt state and be prepared to complete the same recovery protocol as described earlier in this document. This will usually require a database manager to reacquire record and page locks to guarantee that resources required for a rollback will be untouched and available for backdates.

JOnAS Transaction Recovery

Since JOnAS 4_4_0, transaction recovery has been available. JOnAS utilizes the Java Open Transaction Manager (JOTM) jotm.objectweb.org from Objectweb www.objectweb.org as its manager of transactions. JOTM guarantees that all resources are either all committed or rolled back by utilizing a 2-Phase commit protocol when multiple XA resources are involved. Since JOTM_2_0_9, support for the recovery of transactions across resource or system crash has been available.

At JOnAS startup, the Resource Adapters as defined by the Java Connector Architecture 1.5 registers all the resource managers with JOTM. For any resources in an in-doubt state, JOTM requests that the resource completes (commits or aborts) its work and release any data sources (e.g., database pages).

For a resource manager to participate in transaction recovery, it must fully support the XAResource interface as described in the X/Open Specification. This includes implementation of the recover() method for the XAResource. The xaresource.recover() method returns a list of XIDs (a global transaction identifier) that are in a prepared or heuristically committed state. JOTM interrogates its internal recovery tables to determine if the XIDs were participating in a transaction when the system failed.

If a resource implements 2-phase commit protocol as described in the X/Open Specification but does not implement the recover() method, we refer it to as a 'non-conforming' XAResource and it will not properly complete transaction recovery. Currently Oracle RDBMS, IBM DB2, and Bull IDS/II are the only resource managers known to us to support the 2-Phase commit protocol and XA Recovery. Recovery testing has been performed only with Oracle RDBMS, but plans are to test with Postgres SQL when support for XA recovery is available.

Implications of a Non-Conforming Resource Manager

If a resource manager does not implement the prepare() method, its prepare vote cannot be trusted when used with conforming resource managers. That is, the prepare() method is an indication that the data source will commit the resource successfully when the commit() method is called. Since the prepare vote cannot be trusted, the non-conforming resource manager can return a failure on the commit() call. This situation can cause the guarantee of atomicity (invalid results to exist) among the data sources (e.g., databases) to be compromised, some resources have committed and the non-conforming resource has failed.

If a resource manager does not implement the recover() method, it will return no XIDs when requested to recover. This implies that the data source has committed its resources even though the conforming resource managers may return XIDs to recover, but to recover with an abort (rollback). Again, you have a situation where the guarantee of atomicity among the data sources is compromised.

JOnAS-JOTM Transaction Recovery Configuration

JOTM provides for the JOnAS user a properties file, jotm.properties. This file is located in the <jonas-base>/dist/conf directory. The jotm.properties file contains entries that inform JOTM whether recovery is enabled or disabled (default).

The jotm.properties file also contains configuration information of the JOTM recovery log files. These log files contain information transaction history required to resolve in-doubt transactions in case of system failures. JOTM utilizes the features of the High Performance Objectweb Logger (HOWL) howl.objectweb.org to manage the in-doubt records. All but the 'jotm.recovery.Enabled' entry is a HOWL configuration property.

The provided values for HOWL specified in the jotm.properties files should suffice for most installations. A short description of each jotm.properties entry follows.


Property

Value

Description

jotm.recovery.Enabled

true/false (default)

recovery enabled

howl.log.ListConfiguation

true/false (default)

list Howl configuration on trace file

howl.log.BufferSize

4

value is multiplied by 1024 to establish the actual buffer size used by the Howl

howl.log.MinimumBuffers

16

minimum number of buffers

howl.log.MaximumBuffers

16

maximum number of buffers

howl.log.MaximumBlocksPerFile

200

maximum number of blocks per log file

howl.log.FileDirectory

c:/logs

pathname of recovery log files

howl.log.FileName

howl

name of log file, e.g., howl1, howl2

howl.log.MaximumFiles

2

number of recovery log files


The values specified cannot be changed while JOnAS is in execution. JOnAS must be stopped, the jotm.properties values changed, and JOnAS restarted for any new values to be accepted.

XA DataSource Configuration

In a future release of JOnAS and JOTM, additional properties will be added to the XML files used to define the characteristics of a JOnAS Resource Adapter. The additional properties will allow the Resource Adapter implementer to state the conformance level of the resource adapter. That is, does the resource adapter fully implement the xaresource.recover() method. Of course, if the resource adapter implementer inadvertently states an incorrect conformance level, the guarantee of atomicity among the data sources may be compromised.

Monitoring Executing Transactions

A Transaction Monitoring facility is available in the JOnAS Administration (jonasadmin) web tool. With this tool, a JOnAS administrator can monitor the life cycle of executing transactions.



Transaction Monitor Frame

JOnAS Administrative Recovery

In most instances, JOnAS with JOTM will be able to recover all transactions in an in-doubt state after a system failure for Resource Managers and XAResources that implement the recover() method. In those few instances where recovery could not be completed, the JOnAS Administration (jonasadmin) Recovery web tool can be employed.

The administrative recovery feature allows the administrator to view any transactions that require administrative intervention. In the examples that follow, the sequence of actions performed by the JOnAS Administrator is described to complete the commit pending transaction.

The Recovery Frame 1 displays the recovery view in which one transaction has not completed. For example, the XAResource (database) replied to the prepare request, but the XAResource threw a XAER_xxxx exception during the commit request (e.g., database went offline). In this frame, the transaction has two resources (defined by an Xid Count) waiting for administrative action.



Recovery Frame 1

By selecting the transaction specified under Transaction, Recovery Frame 2 displays the XAResource view. In this view, the XAResources are shown with their respective Resource Manager, Xid, and Xid State. The resource manager jdbc_xa1 has a STATUS_COMMITTED state that indicates that it has committed its XAResource. The resource manager jdbc_xa2 has a STATUS_COMMITING state that indicates that the XA resource is in an in-doubt state (i.e., a commit request was sent to the XA resource but the XAResource replied with an XAER_xxxx exception). If the jdbc_xa2 resource manager is a database, its pages will be locked until a commit or rollback is issued on its XAResource.



Recovery Frame 2

Since the jdbc_xa2 resource has not been registered during JOnAS startup, it is not selectable. If any of the Heuristic buttons are selected, Recovery Frame 3 displays the Confirm view that states that no XAResource has been selected.



Recovery Frame 3

By using the JOnAS administrative deploy option to dynamically add a resource adapter 'jonas admin –a jdbc_xa2.rar', the resource manager jdbc_xa2 is deployed and JOnAS requests that JOTM recover any transactions that may be the in-doubt state. Recovery Frame 4 displays that no transactions require recovery since JOTM successfully recovered the transaction.

If a resource manager fails requiring recovery while JOnAS is executing, then dynamic deployment may be used to effect recovery. As in the prior paragraph, the administrator would first undeploy the resource adapter with 'jonas admin –d jdbc_xa2.rar'. A subsequent deployment, 'jonas admin –a jdbc_xa2.rar', would cause JOTM and the resource manager to attempt recovery without needing to restart JOnAS.



Recovery Frame 4

JOnAS Administrative Heuristic Recovery

The Heuristic Frame 1 displays the recovery view in which one transaction has not completed. In this example, the transaction has two resources (defined by an Xid Count) waiting for administrative action.



Heuristic Frame 1

By selecting the transaction specified under Transaction, Heuristic Frame 2 displays the XAResource view. In this view, the XAResources are shown with their respective Resource Manager, Xid, and Xid State. The resource manager jdbc_xa1 has a STATUS_COMMITTED state that indicates that it has committed its XA resource. The resource manager jdbc_xa2 has a STATUS_UNKNOWN state that indicates that the XA resource returned an XA_HEURxxx exception for the commit request. If the jdbc_xa2 resource manager is a database, its pages will be lock until a commit or rollback is issued on its XAResource. Since the jdbc_xa2 resource has not completed, it is the only resource selectable.



Heuristic Frame 2

By selecting jdbc_xa2 and invoking the 'Heuristic Commit' button, the Heuristics Frame 3 displays the Confirm view requesting a confirmation of the commit action. The Heuristic Commit button was selected because the jdbc_xa1 resource has already been committed and therefore we are requesting that the jdbc_xa2 resource also be committed.



Heuristic Frame 3

When the 'Confirm’ button is invoked, JOnAS requests thru JOTM that the XAResource of jdbc_xa2 be completed (commit). Since the all resources of the transaction have now completed the 2-Phase commit protocol successfully, the resources (database pages) can be released and JOTM can remove the transaction information from its internal tables.

The Heuristic Frame 4 displays the Recovery view displaying that there are no longer any transactions requiring recovery.



Heuristic Frame 4