Technical reference for query cache release strategies

MMBase uses a lot of caches to improve it's performance. Each time a peace of information can be retrieved from a cache some more costly operation becomes unnessecary, like reading from a database. MMBase has quite a lot of caches in use (it varies a bit from release to release), but there are principally two kinds: query result caches and all the others. The query result caches cache (surprise!) the result sets of various types of queries, and they are the subject of this document.

So the better these caches work, the less often expansive database reads have to be done, and the more rapidly MMBase can respond to requests. The degree to which a cache works 'well' can be seen as the degree to which a cache can keep it's information during changes in the dataset it buffers. When this data changes, the cache will somehow have to decide what the consequence of this change is for each of it's entries. So the more intelligently a cache can assess these changes, and the less often data is flushed from the cache unnecessarily, the better a cache works. For a more detailed explanation of how the MMBase query caches work see Appendix A.

MMBase has a data model who's strength lies in flexibility rather than performance. Also, MMBase tends to create a lot of queries, that are system generated (read: generic) and not performance optimized. When your MMBase website begins to draw serious visitors, and for some reason nodes that are being read a lot are being updated a lot as well, and you want to show these changes right away, database load can become a serious issue, and cache performance becomes vital. For this reason it used to be so that applications like forums or even polls could be challenging against a great load with MMBase. The query caches were not able to make good judgments about what changes in the data should cause flushes on what cache entries, with far to many flushes as consequence, and far to many queries on the database, leading to poor performance. The 'query cache release strategy project' was created to address this problem.

In this project we actually had two goals. First we wanted to improve the query caches by making them 'smarter', so they could evaluate node and relation changes better. We were going to do that by introducing a set of rules that could analyze the events and determine if a query result set needs to be flushed on account of them.

But we also wanted to provide the means to easily and flexibly add rules to caches, so application developers can optimize their specific data models with custom rules. An example: Think of a forum. Usually there are threads and posts. When a post is added to a specific thread, all queries that query the posts of that specific thread should be flushed from the cache, and those alone. It would be hard to create a generic rule for this kind of optimization. This parent-child relation does not exist between all node types that are related, and it is hard to guess. So it would be a good idea for the developer of the forum to create a custom rule that checks the (parent) thread of all posts that generate a node event, and match that thread against the constraints of the queries in the cache, to see if the changed forum post applies to it. If not, don't flush.

So we wanted a system that would allow others to easily create custom cache invalidation rules (from now on called strategies). We also wanted to be able to dynamically load and unload strategies, and we wanted to be able to see some statistics for each strategy on the system, he MMBase admin > tools > cache jsp page being the designated place to access statistics and functionality of the release strategies.

To achieve this a small framework was created, and inserted at the base of all query result caches: org.mmbase.cache.QueryResultCache. All strategies are subclasses of the abstract class ReleaseStrategy. This class provides some services in the nature of performance tracking, so you can show some statistics for every strategy. It also contains a growing collection of utility methods that are there to help you investigate query objects. For more detail, check out the api docs.

One of the first implementations was ChainedReleaseStrategy. This class is basically a wrapper class for a collection of strategies, with functionality to dynamically add and remove strategies. This class is actually the default strategy for every query cache, and is always loaded with the default strategy(s). Which the default strategies are will change over time, and the default strategies themselves will also get better over time, as more rules are added. Currently there are two global strategies that are always loaded:

To add your own strategy class you have to do two things: Create the class and deploy it.

