The most important point about Hibernate Entity Manager and concurrency control is that it is very easy to understand. Hibernate Entity Manager directly uses JDBC connections and JTA resources without adding any additional locking behavior. We highly recommend you spend some time with the JDBC, ANSI, and transaction isolation specification of your database management system. Hibernate Entity Manager only adds automatic versioning but does not lock objects in memory or change the isolation level of your database transactions. Basically, use Hibernate Entity Manager like you would use direct JDBC (or JTA/CMT) with your database resources.
We start the discussion of concurrency control in Hibernate with the granularity of EntityManagerFactory, and EntityManager, as well as database transactions and long units of work..
In this chapter, and unless explicitly expressed, we will mix and match the concept of entity manager and persistence context. One is an API and programming object, the other a definition of scope. However, keep in mind the essential difference. A persistence context is usually bound to a JTA transaction in Java EE, and a persistence context starts and ends at transaction boundaries (transaction-scoped) unless you use an extended entity manager. Please refer to Section 1.2.3, “Persistence context scope” for more information.
A EntityManagerFactory is an expensive-to-create, threadsafe object intended to be shared by all application threads. It is created once, usually on application startup.
An EntityManager is an inexpensive, non-threadsafe object that should be used once, for a single business process, a single unit of work, and then discarded. An EntityManager will not obtain a JDBC Connection (or a Datasource) unless it is needed, so you may safely open and close an EntityManager even if you are not sure that data access will be needed to serve a particular request. (This becomes important as soon as you are implementing some of the following patterns using request interception.)
To complete this picture you also have to think about database transactions. A database transaction has to be as short as possible, to reduce lock contention in the database. Long database transactions will prevent your application from scaling to highly concurrent load.
What is the scope of a unit of work? Can a single Hibernate EntityManager span several database transactions or is this a one-to-one relationship of scopes? When should you open and close a Session and how do you demarcate the database transaction boundaries?
First, don't use the entitymanager-per-operation antipattern, that is, don't open and close an EntityManager for every simple database call in a single thread! Of course, the same is true for database transactions. Database calls in an application are made using a planned sequence, they are grouped into atomic units of work. (Note that this also means that auto-commit after every single SQL statement is useless in an application, this mode is intended for ad-hoc SQL console work.)
The most common pattern in a multi-user client/server application is entitymanager-per-request. In this model, a request from the client is send to the server (where the EJB3 persistence layer runs), a new EntityManager is opened, and all database operations are executed in this unit of work. Once the work has been completed (and the response for the client has been prepared), the persistence context is flushed and closed, as well as the entity manager object. You would also use a single database transaction to serve the clients request. The relationship between the two is one-to-one and this model is a perfect fit for many applications.
This is the default EJB3 persistence model in a Java EE environment (JTA bounded, transaction-scoped persistence context); injected (or looked up) entity managers share the same persistence context for a particular JTA transaction. The beauty of EJB3 is that you don't have to care about that anymore and just see data access through entity manager and demaraction of transaction scope on session beans as completely orthogonal.
The challenge is the implementation of this (and other) behavior outside an EJB3 container: not only has the EntityManager and resource-local transaction to be started and ended correctly, but they also have to be accessible for data access operations. The demarcation of a unit of work is ideally implemented using an interceptor that runs when a request hits the non-EJB3 container server and before the response will be send (i.e. a ServletFilter if you are using a standalone servlet container). We recommend to bind the EntityManager to the thread that serves the request, using a ThreadLocal variable. This allows easy access (like accessing a static variable) in all code that runs in this thread. Depending on the database transaction demarcation mechanism you chose, you might also keep the transaction context in a ThreadLocal variable. The implementation patterns for this are known as ThreadLocal Session and Open Session in View in the Hibernate community. You can easily extend the HibernateUtil shown in the Hibernate reference documentation to implement this pattern, you don't need any external software (it's in fact very trivial). Of course, you'd have to find a way to implement an interceptor and set it up in your environment. See the Hibernate website for tips and examples. Once again, remember that your first choice is naturally an EJB3 container - preferably a light and modular one such as JBoss application server.
The entitymanager-per-request pattern is not the only useful concept you can use to design units of work. Many business processes require a whole series of interactions with the user interleaved with database accesses. In web and enterprise applications it is not acceptable for a database transaction to span a user interaction with possibly long waiting time between requests. Consider the following example:
The first screen of a dialog opens, the data seen by the user has been loaded in a particular EntityManager and resource-local transaction. The user is free to modify the detached objects.
The user clicks "Save" after 5 minutes and expects his modifications to be made persistent; he also expects that he was the only person editing this information and that no conflicting modification can occur.
We call this unit of work, from the point of view of the user, a long running application transaction. There are many ways how you can implement this in your application.
A first naive implementation might keep the EntityManager and database transaction open during user think time, with locks held in the database to prevent concurrent modification, and to guarantee isolation and atomicity. This is of course an anti-pattern, a pessimistic approach, since lock contention would not allow the application to scale with the number of concurrent users.
Clearly, we have to use several database transactions to implement the application transaction. In this case, maintaining isolation of business processes becomes the partial responsibility of the application tier. A single application transaction usually spans several database transactions. It will be atomic if only one of these database transactions (the last one) stores the updated data, all others simply read data (e.g. in a wizard-style dialog spanning several request/response cycles). This is easier to implement than it might sound, especially if you use EJB3 entity manager and persistence context features:
Automatic Versioning - An entity manager can do automatic optimistic concurrency control for you, it can automatically detect if a concurrent modification occured during user think time (usually by comparing version numbers or timestamps when updating the data in the final resource-local transaction).
Detached Entities - If you decide to use the already discussed entity-per-request pattern, all loaded instances will be in detached state during user think time. The entity manager allows you to merge the detached (modified) state and persist the modifications, the pattern is called entitymanager-per-request-with-detached-entities. Automatic versioning is used to isolate concurrent modifications.
Extended Entity Manager - The Hibernate Entity Manager may be disconnected from the underlying JDBC connection between two client calls and reconnected when a new client request occurs. This pattern is known as entitymanager-per-application-transaction and makes even merging unnecessary. An extend persistence context is responsible to collect and retain any modification (persist, merge, remove) made outside a transaction. The next client call made inside an active transaction (typically the last operation of a user conversation) will execute all queued modifications. Automatic versioning is used to isolate concurrent modifications.
Both entitymanager-per-request-with-detached-objects and entitymanager-per-application-transaction have advantages and disadvantages, we discuss them later in this chapter in the context of optimistic concurrency control.
TODO: This note should probably come later.
An application may concurrently access the same persistent state in two different persistence contexts. However, an instance of a managed class is never shared between two persistence contexts. Hence there are two different notions of identity:
foo.getId().equals( bar.getId() )
foo==bar
Then for objects attached to a particular persistence context (i.e. in the scope of an EntityManager) the two notions are equivalent, and JVM identity for database identity is guaranteed by the Hibernate Entity Manager. However, while the application might concurrently access the "same" (persistent identity) business object in two different persistence contexts, the two instances will actually be "different" (JVM identity). Conflicts are resolved using (automatic versioning) at flush/commit time, using an optimistic approach.
This approach leaves Hibernate and the database to worry about concurrency; it also provides the best scalability, since guaranteeing identity in single-threaded units of work only doesn't need expensive locking or other means of synchronization. The application never needs to synchronize on any business object, as long as it sticks to a single thread per EntityManager. Within a persistence context, the application may safely use == to compare entities.
However, an application that uses == outside of a persistence context might see unexpected results. This might occur even in some unexpected places, for example, if you put two detached instances into the same Set. Both might have the same database identity (i.e. they represent the same row), but JVM identity is by definition not guaranteed for instances in detached state. The developer has to override the equals() and hashCode() methods in persistent classes and implement his own notion of object equality. There is one caveat: Never use the database identifier to implement equality, use a business key, a combination of unique, usually immutable, attributes. The database identifier will change if a transient entity is made persistent (see the contract of the persist() operation). If the transient instance (usually together with detached instances) is held in a Set, changing the hashcode breaks the contract of the Set. Attributes for good business keys don't have to be as stable as database primary keys, you only have to guarantee stability as long as the objects are in the same Set. See the Hibernate website for a more thorough discussion of this issue. Also note that this is not a Hibernate issue, but simply how Java object identity and equality has to be implemented.
Never use the anti-patterns entitymanager-per-user-session or entitymanager-per-application (of course, there are rare exceptions to this rule, e.g. entitymanager-per-application might be acceptable in a desktop application, with manual flushing of the persistence context). Note that some of the following issues might also appear with the recommended patterns, make sure you understand the implications before making a design decision:
An entity manager is not thread-safe. Things which are supposed to work concurrently, like HTTP requests, session beans, or Swing workers, will cause race conditions if an EntityManager instance would be shared. If you keep your Hibernate EntityManager in your HttpSession (discussed later), you should consider synchronizing access to your Http session. Otherwise, a user that clicks reload fast enough may use the same EntityManager in two concurrently running threads. You will very likely have provisions for this case already in place, for other non-threadsafe but session-scoped objects.
An exception thrown by the Entity Manager means you have to rollback your database transaction and close the EntityManager immediately (discussed later in more detail). If your EntityManager is bound to the application, you have to stop the application. Rolling back the database transaction doesn't put your business objects back into the state they were at the start of the transaction. This means the database state and the business objects do get out of sync. Usually this is not a problem, because exceptions are not recoverable and you have to start over your unit of work after rollback anyway.
The persistence context caches every object that is in managed state (watched and checked for dirty state by Hibernate). This means it grows endlessly until you get an OutOfMemoryException, if you keep it open for a long time or simply load too much data. One solution for this is some kind batch processing with regular flushing of the persistence context, but you should consider using a database stored procedure if you need mass data operations. Some solutions for this problem are shown in Chapter 6, Batch processing. Keeping a persistence context open for the duration of a user session also means a high probability of stale data, which you have to know about and control appropriately.
Datatabase (or system) transaction boundaries are always necessary. No communication with the database can occur outside of a database transaction (this seems to confuse many developers who are used to the auto-commit mode). Always use clear transaction boundaries, even for read-only operations. Depending on your isolation level and database capabilities this might not be required but there is no downside if you always demarcate transactions explicitly. You'll have to do operations outside a transaction, though, when you'll need to retain modifications in an EXTENDED persistence context.
An EJB3 application can run in non-managed (i.e. standalone, simple Web- or Swing applications) and managed J2EE environments. In a non-managed environment, an EntityManagerFactory is usually responsible for its own database connection pool. The application developer has to manually set transaction boundaries, in other words, begin, commit, or rollback database transactions itself. A managed environment usually provides container-managed transactions, with the transaction assembly defined declaratively through annotations of EJB session beans, for example. Programmatic transaction demarcation is then no longer necessary, even flushing the EntityManager is done automatically.
Usually, ending a unit of work involves four distinct phases:
commit the (resource-local or JTA) transaction (this automatically flushes the entity manager and persistence context)
close the entity manager (if using an application-managed entity manager)
handle exceptions
We'll now have a closer look at transaction demarcation and exception handling in both managed- and non-managed environments.
If an EJB3 persistence layer runs in a non-managed environment, database connections are usually handled by Hibernate's pooling mechanism behind the scenes. The common entity manager and transaction handling idiom looks like this:
// Non-managed environment idiom EntityManager em = emf.createEntityManager(); EntityTransaction tx = null; try { tx = em.getTransaction(); tx.begin(); // do some work ... tx.commit(); } catch (RuntimeException e) { if ( tx != null && tx.isActive() ) tx.rollback(); throw e; // or display error message } finally { em.close(); }
You don't have to flush() the EntityManager explicitly - the call to commit() automatically triggers the synchronization.
A call to close() marks the end of an EntityManager. The main implication of close() is the release of resources - make sure you always close and never outside of guaranteed finally block.
You will very likely never see this idiom in business code in a normal application; fatal (system) exceptions should always be caught at the "top". In other words, the code that executes entity manager calls (in the persistence layer) and the code that handles RuntimeException (and usually can only clean up and exit) are in different layers. This can be a challenge to design yourself and you should use J2EE/EJB container services whenever they are available. Exception handling is discussed later in this chapter.
In a JTA environment, you don't need any extra API to interact with the transaction in your environment. Simply use transaction declaration or the JTA APIs.
If you are using a RESOURCE_LOCAL entity manager, you need to demarcate your transaction boundaries through the EntityTransaction API. You can get an EntityTransaction through entityManager.getTransaction(). This EntityTransaction API provides the regular begin(), commit(), rollback() and isActive() methods. It also provide a way to mark a transaction as rollback only, ie force the transaction to rollback. This is very similar to the JTA operation setRollbackOnly(). When a commit() operation fail and/or if the transaction is marked as setRollbackOnly(), the commit() method will try to rollback the transaction and raise a javax.transaction.RollbackException.
In a JTA entity manager, entityManager.getTransaction() calls are not permitted.
If your persistence layer runs in an application server (e.g. behind EJB3 session beans), every datasource connection obtained internally by the entity manager will automatically be part of the global JTA transaction. Hibernate offers two strategies for this integration.
If you use bean-managed transactions (BMT), the code will look like this:
// BMT idiom @Resource public UserTransaction utx; @Resource public EntityManagerFactory factory; public void doBusiness() { EntityManager em = factory.createEntityManager(); try { // do some work ... utx.commit(); } catch (RuntimeException e) { if (utx != null) utx.rollback(); throw e; // or display error message } finally { em.close(); }
With Container Managed Transactions (CMT) in an EJB3 container, transaction demarcation is done in session bean annotations or deployment descriptors, not programatically. The EntityManager will automatically be flushed on transaction completion (and if you have injected or lookup the EntityManager, it will be also closed automatically). If an exception occurs during the EntityManager use, transaction rollback occurs automatically if you don't catch the exception. Since EntityManager exceptions are RuntimeExceptions they will rollback the transaction as per the EJB specification (system exception vs. application exception).
It is important to let Hibernate EntityManager define the hibernate.transaction.factory_class (ie not overriding this value). Remember to also set org.hibernate.transaction.manager_lookup_class.
If you work in a CMT environment, you might also want to use the same entity manager in different parts of your code. Typically, in a non-managed environment you would use a ThreadLocal variable to hold the entity manager, but a single EJB request might execute in different threads (e.g. session bean calling another session bean). The EJB3 container takes care of the persistence context propagation for you. Either using injection or lookup, the EJB3 container will return an entity manager with the same persistence context bound to the JTA context if any, or create a new one and bind it (see Section 1.2.4, “Persistence context propagation” .)
Our entity manager/transaction management idiom for CMT and EJB3 container-use is reduced to this:
//CMT idiom through injection @PersistenceContext(name="sample") EntityManager em;
In other words, all you have to do in a managed environment is to inject the EntityManager, do your data access work, and leave the rest to the container. Transaction boundaries are set declaratively in the annotations or deployment descriptors of your session beans. The lifecycle of the entity manager and persistence context is completely managed by the container.
TODO: The following paragraph is very confusing, especially the beginning...
When using particular Hibernate native APIs, one caveat has to be remembered: after_statement connection release mode. Due to a silly limitation of the JTA spec, it is not possible for Hibernate to automatically clean up any unclosed ScrollableResults or Iterator instances returned by scroll() or iterate(). You must release the underlying database cursor by calling ScrollableResults.close() or Hibernate.close(Iterator) explicity from a finally block. (Of course, most applications can easily avoid using scroll() or iterate() at all from the CMT code.)
If the EntityManager throws an exception (including any SQLException), you should immediately rollback the database transaction, call EntityManager.close() (if createEntityManager() has been called) and discard the EntityManager instance. Certain methods of EntityManager will not leave the persistence context in a consistent state. No exception thrown by an entity manager can be treated as recoverable. Ensure that the EntityManager will be closed by calling close() in a finally block. Note that a container managed entity manager will do that for you. You just have to let the RuntimeException propagate up to the container.
The Hibernate entity manager generally raises exceptions which encapsulate the Hibernate core exception. Common exceptions raised by the EntityManager API are
IllegalArgumentException: something wrong happen
EntityNotFoundException: an entity was expected but none match the requirement
TransactionRequiredException: this operation has to be in a transaction
IllegalStateException: the entity manager is used in a wrong way
The HibernateException, which wraps most of the errors that can occur in a Hibernate persistence layer, is an unchecked exception. Note that Hibernate might also throw other unchecked exceptions which are not a HibernateException. These are, again, not recoverable and appropriate action should be taken.
Hibernate wraps SQLExceptions thrown while interacting with the database in a JDBCException. In fact, Hibernate will attempt to convert the eexception into a more meningful subclass of JDBCException. The underlying SQLException is always available via JDBCException.getCause(). Hibernate converts the SQLException into an appropriate JDBCException subclass using the SQLExceptionConverter attached to the SessionFactory. By default, the SQLExceptionConverter is defined by the configured dialect; however, it is also possible to plug in a custom implementation (see the javadocs for the SQLExceptionConverterFactory class for details). The standard JDBCException subtypes are:
JDBCConnectionException - indicates an error with the underlying JDBC communication.
SQLGrammarException - indicates a grammar or syntax problem with the issued SQL.
ConstraintViolationException - indicates some form of integrity constraint violation.
LockAcquisitionException - indicates an error acquiring a lock level necessary to perform the requested operation.
GenericJDBCException - a generic exception which did not fall into any of the other categories.
All application managed entity manager and container managed persistence contexts defined as such are EXTENDED. This means that the persistence context type goes beyond the transaction life cycle. We should then understand what happens to operations made outside the scope of a transaction.
In an EXTENDED persistence context, all read only operations of the entity manager can be executed outside a transaction (find(), getReference(), refresh(), and read queries). Some modifications operations can be executed outside a transaction, but they are queued until the persistence context join a transaction: this is the case of persist(), merge(), remove(). Some operations cannot be called outside a transaction: flush(), lock(), and update/delete queries.
When using an EXTENDED persistence context with a container managed entity manager, the lifecycle of the persistence context is binded to the lifecycle of the Stateful Session Bean. Plus if the entity manager is created outside a transaction, modifications operations (persist, merge, remove) are queued in the persistence context and not executed to the database.
When a method of the stateful session bean involved or starting a transaction is later called, the entity manager join the transaction. All queued operation will then be executed to synchronize the persistence context.
This is perfect to implement the entitymanager-per-conversation pattern. A stateful session bean represents the conversation implementation. All intermediate conversation work will be processed in methods not involving transaction. The end of the conversation will be processed inside a JTA transaction. Hence all queued operations will be executed to the database and commited. If you are interested in the notion of conversation inside your application, have a look at JBoss Seam. Jboss Seam emphasizes the concept of conversation and entity manager lifecycle and bind EJB3 and JSF together.
Application-managed entity manager are always EXTENDED. When you create an entity manager inside a transaction, the entity manager automatically join the current transaction. If the entity manager is created outside a transaction, the entity manager will queue the modification operations. When
entityManager.joinTransaction() is called when a JTA transaction is active for a JTA entity manager
entityManager.getTransaction().begin() is called for a RESOURCE_LOCAL entity manager
the entity manager join the transaction and all the queued operations will then be executed to synchronize the persistence context.
It is not legal to call entityManager.joinTransaction() if no JTA transaction is involved.
The only approach that is consistent with high concurrency and high scalability is optimistic concurrency control with versioning. Version checking uses version numbers, or timestamps, to detect conflicting updates (and to prevent lost updates). Hibernate provides for three possible approaches to writing application code that uses optimistic concurrency. The use cases we show are in the context of long application transactions but version checking also has the benefit of preventing lost updates in single database transactions.
In an implementation without much help from the persistence mechanism, each interaction with the database occurs in a new EntityManager and the developer is responsible for reloading all persistent instances from the database before manipulating them. This approach forces the application to carry out its own version checking to ensure application transaction isolation. This approach is the least efficient in terms of database access. It is the approach most similar to EJB2 entities:
// foo is an instance loaded by a previous entity manager em = factory.createEntityManager(); EntityTransaction t = em.getTransaction(); t.begin(); int oldVersion = foo.getVersion(); Foo dbFoo = em.find( foo.getClass(), foo.getKey() ); // load the current state if ( dbFoo.getVersion()!=foo.getVersion ) throw new StaleObjectStateException(); dbFoo.setProperty("bar"); t.commit(); em.close();
The version property is mapped using @Version, and the entity manager will automatically increment it during flush if the entity is dirty.
Of course, if you are operating in a low-data-concurrency environment and don't require version checking, you may use this approach and just skip the version check. In that case, last commit wins will be the default strategy for your long application transactions. Keep in mind that this might confuse the users of the application, as they might experience lost updates without error messages or a chance to merge conflicting changes.
Clearly, manual version checking is only feasible in very trivial circumstances and not practical for most applications. Often not only single instances, but complete graphs of modified ojects have to be checked. Hibernate offers automatic version checking with either detached instances or an extended entity manager and persistence context as the design paradigm.
A single persistence context is used for the whole application transaction. The entity manager checks instance versions at flush time, throwing an exception if concurrent modification is detected. It's up to the developer to catch and handle this exception (common options are the opportunity for the user to merge his changes or to restart the business process with non-stale data).
In an EXTENDED persistence context, all operations made outside an active transaction are queued. The EXTENDED persistence context is flushed when executed in an active transaction (at worse at commit time).
The Entity Manager is disconnected from any underlying JDBC connection when waiting for user interaction. In an application-managed extended entity manager, this occurs automatically at transaction completion. In a stateful session bean holding a container-managed extended entity manager (i.e. a SFSB annotated with @PersistenceContext(EXTENDED)), this occurs transparently as well. This approach is the most efficient in terms of database access. The application need not concern itself with version checking or with merging detached instances, nor does it have to reload instances in every database transaction. For those who might be concerned by the number of connections opened and closed, remember that the connection provider should be a connection pool, so there is no performance impact. The following examples show the idiom in a non-managed environment:
// foo is an instance loaded earlier by the extended entity manager em.getTransaction.begin(); // new connection to data store is obtained and tx started foo.setProperty("bar"); em.getTransaction().commit(); // End tx, flush and check version, disconnect
The foo object still knows which persistence context it was loaded in. With getTransaction.begin(); the entity manager obtains a new connection and resumes the persistence context. The method getTransaction().commit() will not only flush and check versions, but also disconnects the entity manager from the JDBC connection and return the connection to the pool.
This pattern is problematic if the persistence context is too big to be stored during user think time, and if you don't know where to store it. E.g. the HttpSession should be kept as small as possible. As the persistence context is also the (mandatory) first-level cache and contains all loaded objects, we can probably use this strategy only for a few request/response cycles. This is indeed recommended, as the persistence context will soon also have stale data.
It is up to you where you store the extended entity manager during requests, inside an EJB3 container you simply use a stateful session bean as described above. Don't transfer it to the web layer (or even serialize it to a separate tier) to store it in the HttpSession. In a non-managed, two-tiered environment the HttpSession might indeed be the right place to store it.
With this paradigm, each interaction with the data store occurs in a new persistence context. However, the same persistent instances are reused for each interaction with the database. The application manipulates the state of detached instances originally loaded in another persistence context and then merges the changes using EntityManager.merge():
// foo is an instance loaded by a non-extended entity manager foo.setProperty("bar"); entityManager = factory.createEntityManager(); entityManager.getTransaction().begin(); managedFoo = session.merge(foo); // discard foo and from now on use managedFoo entityManager.getTransaction().commit(); entityManager.close();
Again, the entity manager will check instance versions during flush, throwing an exception if conflicting updates occured.