Chapter 2. The Rule Engine

2.1. What is a Rule Engine?

2.1.1. Introduction and Background

Artificial Intelligence (A.I.) is a very broad research area that focuses on "Making computers think like people" and includes disciplines such as Neural Networks, Genetic Algorithms, Decision Trees, Frame Systems and Expert Systems. Knowledge representation is the area of A.I. concerned with how knowledge is represented and manipulated. Expert Systems use Knowledge representation to facilitate the codification of knowledge into a knowledge base which can be used for reasoning - i.e. we can process data with this knowledge base to infer conclusions. Expert Systems are also known as Knowledge-based Systems and Knowledge-based Expert Systems and are considered 'applied artificial intelligence'. The process of developing with an Expert System is Knowledge Engineering. EMYCIN was one of the first "shells" for an Expert System, which was created from the MYCIN medical diagnosis Expert System. Where-as early Expert Systems had their logic hard coded, "shells" separated the logic from the system, providing an easy to use environment for user input. Drools is a Rule Engine that uses the Rule Based approached to implement an Expert System and is more correctly classified as a Production Rule System.

The term "Production Rule" originates from formal grammar - where it is described as "an abstract structure that describes a formal language precisely, i.e., a set of rules that mathematically delineates a (usually infinite) set of finite-length strings over a (usually finite) alphabet" (wikipedia).

Business Rule Management Systems build additional value on top of a general purpose Rule Engines by providing, business user focused, systems for rule creation, management, deployment, collaboration, analysis and end user tools. Further adding to this value is the fast evolving and popular methodology "Business Rules Approach", which is a helping to formalize the role of Rule Engines in the enterprise.

The term Rule Engine is quite ambiguous in that it can be any system that uses rules, in any form, that can be applied to data to produce outcomes; which includes simple systems like form validation and dynamic expression engines. The book "How to Build a Business Rules Engine (2004)" by Malcolm Chisholm exemplifies this ambiguity. The book is actually about how to build and alter a database schema to hold validation rules. The book then shows how to generate VB code from those validation rules to validate data entry - while a very valid and useful topic for some, it caused quite a surprise to this author, unaware at the time in the subtleties of Rules Engines differences, who was hoping to find some hidden secrets to help improve the Drools engine. JBoss jBPM uses expressions and delegates in its Decision nodes; which control the transitions in a Workflow. At each node it evaluates has a rule set that dictates the transition to undertake - this is also a Rule Engine. While a Production Rule System is a kind of Rule Engine and also an Expert System, the validation and expression evaluation Rule Engines mention previously are not Expert Systems.

A Production Rule System is turing complete with a focus on knowledge representation to express propositional and first order logic in a concise, non ambiguous and declarative manner. The brain of a Production Rules System is an Inference Engine that is able to scale to a large number of rules and facts. The Inference Engine matches facts and data, against Production Rules, also called Productions or just Rules, to infer conclusions which result in actions. A Production Rule is a two-part structure using First Order Logic for knowledge representation.

when
    <conditions>
then
    <actions>

The process of matching the new or existing facts against Production Rules is called Pattern Matching, which is performed by the Inference Engine. There are a number of algorithms used for Pattern Matching by Inference Engines including:

  • Linear

  • Rete

  • Treat

  • Leaps

Drools implements and extends the Rete algorith, Leaps use to be supported but was removed due to poor maintenance. The Drools Rete implementation is called ReteOO, signifying that Drools has an enhanced and optimized implementation of the Rete algorithm for Object Oriented systems. Other Rete based engines also have marketing terms for their proprietary enhancements to Rete, like RetePlus and Rete III. It is important to understand that names like Rete III are purely marketing where, unlike the original published Rete Algorithm, no details of the implementation are published. This makes questions such as "Does Drools implement Rete III?" nonsensical. The most common enhancements are covered in "Production Matching for Large Learning Systems (Rete/UL)" (1995) by Robert B. Doorenbos.

The Rules are stored in the Production Memory and the facts that the Inference Engine matches against the Working Memory. Facts are asserted into the Working Memory where they may then be modified or retracted. A system with a large number of rules and facts may result in many rules being true for the same fact assertion, these rules are said to be in conflict. The Agenda manages the execution order of these conflicuting rules using a Conflict Resolution strategy.

A Basic Rete network

Figure 2.1. A Basic Rete network


A Production Rule System's Inference Engine is stateful and able to enforce truthfulness - called Truth Maintence. A logical relationship can be declared by actions which means the action's state depends on the inference remaining true; when it is no longer true the logical dependent action is undone. The "Honest Politician" is an example of Truth Maintenance, which always ensures that hope can only exist for a democracy while we have honest politicians.

when
    an honest Politician exists
then
    logically assert Hope

when
   Hope exists
then
   print "Hurrah!!! Democracy Lives" 

when
   Hope does not exist
then
   print "Democracy is Doomed" 

There are two methods of execution for a Production Rule Systems - Forward Chaining and Backward Chaining; systems that implement both are called Hybrid Production Rule Systems. Understanding these two modes of operation are key to understanding why a Production Rule System is different and how to get the best from them. Forward chaing is 'data-driven' and thus reactionary - facts are asserted into the working memory which results in one or more rules being concurrently true and scheduled for execution by the Agenda - we start with a fact, it propagates and we end in a conclusion. Drools is a forward chaining engine.

Forward Chaining

Figure 2.2. Forward Chaining


Backward chaining is 'goal-driven', meaning that we start with a conclusion which the engine tries to satisfy. If it can't it then searches for conclusions that it can, known as 'sub goals', that will help satisfy some unknown part of the current goal - it continues this process until either the initial conclusion is proven or there are no more sub goals. Prolog is an example of a Backward Chaining engine; Drools will be adding support for Backward Chaining in its next major release.

Backward Chaining

Figure 2.3. Backward Chaining