The Open For Business Project SourceForge Logo

The Open For Business Project: Foundation Technology

Written By: David E. Jones, [email protected]
Last Updated: August 20, 2001

Table of Contents


Introduction

This document describes the two lower layers of the Architecture Model, found on the Architecture Overview page.  The foundation technology consists of the the Common Tool Components, and the Basic Technology Components found in that diagram.  The Common Application Components and Vertical Applications are built on top of this layer, and use these components to simplify implementation and provide a consistent way of doing things.

The Vertical Applications use data and logic defined in the Common Application Components layer and built on and around them to apply them to different business processes.  The data underneath is shared so that no synchronization or integration is necessary between vertical applications.

These foundation components are often available as separate products, or may be included in a larger package. Sometimes they are integrated tightly with a vertical application or other foundation components and hidden from the end user and/or implementer, this is something we want to avoid.  We want to develop these components so that they can be used in any application and somewhat independent of one another, although certain dependencies are necessary and a good idea.  For instance, the Workflow engine will use the Rules engine as well as other logic mechanisms, and the Content Management will use the Workflow engine.  The Data Analysis tool will also use the Rules engine, but in a different way, it will analyze data for trends and create rules based on the trends for marketing, pricing, or other rules driven user facing systems.

So, the larger system should be made of these components, as well as components that do more specialized tasks. Where possible functionality in the system should be implemented as a set of rules, reports, workflows and interfaces to content management and data analysis components. 


Common Tool Components


Services Engine

The Services Engine is an important tool that makes it easy to create and use sharable, reusable, and distributable application components. All services in a given service context can be accessed through a single API. A service can be implemented as either a workflow, a ruleset, a Java method, or a BeanShell script. Services can also remote services that are implemented in any way but available through SOAP. The application that calls the service does not need to know where the service is or how it is implemented - it simply relies on the service doing it's job.

Another nice part about the Services Engine is that in addition to providing services for internal use in applications, it also makes those services available to other systems through SOAP. No additional configuration is needed to use this feature beyond that provided in the service definition XML file.


Workflow

A workflow engine is like a rules engine in that it moves functionality from the lower level code to high level structures and relationships that non-technical people can work with. Workflows are the high level procedural structure behind complex and potentially varying operations. Workflows are useful for order processing pipelines, marketing campaign management procedures, sales procedures, manufacturing processes and other procedural tasks.  A workflow engine maintains its state persistently and can handle procedures that require human intervention and communication with other systems, even when there may be long waiting times between when an activity is started and completed.

A Workflow Process basically has a number of Activities, with rules for transitions between activities. The workflow engine should be able to use the rules engine, scripting languages or Java calls to perform actions or make flow decisions.

Workflow Editor/Modeler

Just as with the rules editors, workflow editors or modelers also fall into the two categories of very general, and specific task oriented. A general editor should be included in addition to a number of specific task or problem associated editors, such as one for modeling an order pipeline, one for setting up content creation and management procedures, one for setting up the flow of event for a sales process or a marketing campaign, and so forth.

Workflow Execution Engine

The workflow execution engine has the same issues as the rules execution engine as far as communication, or integration, with the application that uses it is concerned.


Rules

This component is a high level way of implementing functionality and interacting with data in the system. Rules may replace the implementations of features normally implemented in a programming or scripting language and provide a value add in that they are easier to learn and work with and provide an automated way of simplifying complex tasks. 

Rules are used in groups or sets. Groups of rules are applied in specific circumstances, but the order of the execution of the rules does not have to be maintained by the user of the system, the rules engine does it automatically using inferencing  techniques. Rules engines are useful for price calculation, business requirements checking, general personalization and content selection and other conditional tasks.

Rules Editor/Creator

There should be two types of rule editing or creating tools. The first is for manual creation of rules, and editing of current rules. There should be a tool that can not only create all types of rules, but generically edit all types. 

 The second type of rule creation and editing software is task specific and can be specialized for whatever purpose, and automated to any extent.  Rules may be created automatically because sometimes data mining or other application specific tools create rules for use in their domains.

Rules Execution Engine

Once a set of rules has been created, a rules execution engine should be available for use at any time during a program�s execution. The rules execution engine must be integrated with the rest of the program to some extent or have some other way of reading from and writing to program data, and causing the execution of certain operations in the program. A log should be maintained, or some other mechanism, to determine which rules influenced a certain outcome. The rule execution engine should be able to pool resources, as with many other resources in an application server, to provide better response times and avoid continual re-creation of rules engine instances.


Constraint Based Optimization

Constraints and enumeration are like rules in that they are another way of modeling a problem.  Certain problems are suited better to a rules engine, and other problems are suited much better to a constraint engine.

Constraint Based Optimization tools simplify the general task of taking a large set of possibilities and reducing that set to only those that conform to the constraints specified.  After removing all possibilities that break the constraints, the set can further be reduced by applying a metric to choose the best ones.  When the number of possibilities is large, it is not feasible to create the entire set of possibilities and reduce them, instead a search/decision tree is made and traversed searching for a possibility that confirms to all constraints.  This can produce a legal result quickly, and then be left to continue searching for a better result as defined by the metrics.

Possibility sets are defined by defining variable, and then enumerating the possible values for each variable.  When the variable are combined, the possibility set is created where each combination of values for the variables is on possibility.  Constraints are used to declare that combinations of values or specific values are required or not allowed.

This technique can be used to model many problems, and find valid and optimal solutions.  Example applications include automatic manufacture pipeline planning, truck scheduling and route planning, airport gate assignment, configuration of products or services, and many others.

Constraints can be automatically extracted from data analysis, or implied by data structures.  They can be written by hand and changed and applied in real time.  Constraints and enumeration of states can be also used to create workflows automatically, and provide automated real-time decision making and workflow reconfiguration.


Entity Engine

The Open For Business Entity Engine has evolved over the course of the project. It was originally created as a code generator that was entity specific using an evolving set of templates to provide functionality based on entity definitions. The newest evolution of the Entity Engine is a fully dynamic attribute based engine where one set of code handles the entity management and persistence for all of the entities defined in the system.

To understand this better imagine all of the code that might be generated by the code generator for each specific entity and imagine that code being rewritten to work for any entity definition that might occur. This is what the current entity engine does. The functionality of this engine can be supplemented by the code generator, but can generally be used alone. The advantage to using fully dynamic code like this is that maintenance is much simpler and management of entities in general can be enhanced. Adding a field to an entity is now a five minute task (or a 30 second task if you don't do any testing) instead of what used to take at least fifteen to twenty minutes, assuming no code merging needed to take place. The efficiency case is even stronger when compared to manually created or maintained Entity Beans.

Another advantage of the Entity Engine as it currently stands is that it can be used for persistence of entities either through an Entity EJB or directly to the database through JDBC using the same API. The only difference between the two is that you get a different Helper class depending on the type of persistence you want. This is accomplished using shared Helper and Value Object definitions.

The Entity Engine lives in the Core module of ofbiz along with the Servlet Controller and other useful tools there. The package name is org.ofbiz.core.entity, with a sub-package called model that contains the code to model entities and read the XML entity definition files.

For more information on the Entity Engine, see the Entity Engine documentation.


Data Analysis

Data Warehousing
Statistical Analysis

Apply statistical methods to reduce a large amount of data to a simpler pattern, usually expressed as a formula or graph or set of constraints, in order to make it more understandable by humans. Advanced forms of statistical analysis such as regression, factor, correlation or cluster analysis are sometimes used as the basis for data mining techniques.

Data Mining

Described as �the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data� (Fayyed, Piatesky-Shapiro, Smyth).  

�It is the army of analysts we could not afford in the OLAP world, sifting through all the billions of possible patterns, looking for significant relationships which give us knowledge to act� (Flanagan and Safdie). 

�The significant problems we face cannot be solved by the same level of thinking that created them� (Albert Einstein).

Deduction

Create additional attributes based on existing attributes and rules, constraints, neural networks, or other forms of generalization or of describing patterns. Collaborative filtering is a form of this type of data mining that uses existing data and static rules to deduce new attributes or categories..

Induction, abstraction

Create generalizations from attributes and patterns that describe those attributes and patterns. The generalizations can be represented as rules, constraints, neural networks, or other pattern descriptors. The patterns may also be used to create structure, as in clustering or sequencing.

Clustering, segmentation, affinity

Automatically group or segment elements by attributes or patterns. Association is a variation on this where connections or associations are made between entity pairs.

Sequencing

Create a linear ordering of elements according to abstract patterns or rules.

Rule & Constraint Generation

Create rules or constraints that describe attributes and patterns in the data. These rules and constraints can be used later for deduction or for driving operational flows, as is often done with personalization.

Neural Network Generation

Create neural networks that describe a data point or a set of data points. This process is sometimes known as training the neural network. These neural networks can later be used to compare other data to the data they describe. This is done by using the other data as inputs to the neural network, and getting a result out which numerically describes how related they are. Neural networks can represent conceptual structures without the need for creating explicit definitions or structures.

Query & Reporting

The reporting tool provides for the visualization and presentation of data from raw sources or from the output of a data analysis procedure. The reporting tool may do some data analysis, especially statistical analysis, in order to create the visual elements, but it is not necessary for it to address anything more complex. It primarily simplifies the query and presentation process. 

As part of the larger system, the reporting tool should be loosely coupled to each high level functional component. Each high level component should have a set of reports that exist in the standard reporting tool format and that are customizable through standard and specialized tools by the end user, and where necessary by a technician internal or external to the company. The reporting tool should support a flexible data import and transformation interface that supports various databases and flat files used in the system, and has utilities to support and transform custom database schema and file formats. 

Ideally the reporting tool would be a J2EE application with an EJB interface to data sources, in addition to other custom interfaces that could be coded in Java, which would allow access to COM, CORBA and other component interfaces.

Online Analytical Processing � Multi-dimensional

View subsets, or sub-cubes, of multi-dimensional natured data to manually find patterns or trends. This is essentially an advanced query and reporting tool that creates an information layer between the data and the user and allows the user a more customized view of the data. Note that this is different than statistical analysis because often more information is presented to the user and not less, but only a specified subset at a time. This may include results from more standard statistical analysis.

Closed Loop Analysis

The results of data analysis are either automatically sent back to the operating model, or translated/interpreted by a human and changes are made to the operating model. The new data from operation based on that model is then analyzed in light of these changes.  In other words rules, constraints, attributes and other information is created from analysis, and then used in the business operations.  Closed Loop Analysis is used to measure the effectiveness of the changes to the operating model.


Content Management

Content Repository and Version Control System

The main purpose of a content management system is to provide a place to store content and manage it as it changes. This includes maintenance of revision histories, current versions, and so forth. The system should be able to deal with many types of content, including: various forms or plain text; standard word processing, spreadsheet, and presentation files; various image formats; relational and object oriented database information. The highest priority among these different file types is plain text. Many of the other types can be stored in a binary format very simply but require a lot more work to do revision control of them.  Tools should be included to merge different revisions of the data.

Meta-data and Structure Management and Storage

The content management system should be able to store and manage all meta-data corresponding to the data or content it manages. It should also have facilities for managing the structure of the content in an hierarchy, abstract relational graph, or a sequence. Manual tools should be provided for editing the meta-data and structure information. The structure information and meta-data should also be available to external programs for analysis and modification.

Content Workflow Tool

The content workflow tool is a generalized tool to manage tasks for individuals based on workflows defined for each piece or set of content. This can be used for a change request & bug management system, or complex content creation tasks, and for quality control procedures once the content has been created. The workflow tool here would be the workflow tool listed above with some customizations specific to content management.

Content Deployment

Deployment tools should be included to facilitate the deployment of data and meta-data controlled by the content management system, and outside the content management system. The deployment tool should have facilities to call scripts or other programs as part of it�s workflow so that manual steps can be completely removed from the process.


Knowledge Management

Automated Content Categorization & Organization

This is a facility to categorize and reference or store various forms of content, structured or unstructured. This may include user or content profiles, live chat text, email, web sites, documents (text, Word, PDF, etc), news feeds and other textual content as unstructured, or semi-structured data, and database or organized file data as structured data. 

Tools for the input of this content should include manual targeting to specific pieces of content, and general spider tools such as those that follow web or other content links. 

The tools automatically induce attributes, and use existing attributes, to classify the content, describe concepts in it, or create new attributes for groups of content. The output of the process may include content attributes or other meta information about the content such as rules or constraints, or association links with other related content. Manual overrides or tweaking of these links or meta information should be possible.

Expert Identification / Person Classification

These tools operate on user profiles and other user information such as authored or viewed content to isolate knowledge clusters of individuals. With this information an appropriate resource can be found to address a specific issue or set of issues. 

The user information that results from these processes can also be used to group users and create communities and provide targeted content for individuals or groups of users on a push or pull basis.

Knowledge Presentation
Knowledge Visualization

Knowledge visualization tools may include features such as viewing knowledge clustering frequency charts and knowledge association graphs or maps. These knowledge graphs can be loosely formatted, or organized into tree-like hierarchies.

Knowledge Viewing

This is a general tool for knowledge viewing. It may organize the knowledge hierarchically using iconic representations with links to the actual knowledge, or provide various search options with links and short content descriptions as results, or whatever tool that may be available for general content browsing. This may include personalization features that tailor content based on user profile information. The output may be available in an application or as web based information.

Automated Applications
Real Time Cross Reference

These applications monitor content that is being viewed or created and offers links or references to related content in real time.

New Content Dissemination

These applications monitor and classify new content and send it to users who have subscribed to that type of content. The new content may include news feeds, updated web pages, email messages, internal documents, etc. 


Basic Technology Components


Foundation Technology


Security

Security features should include policy options for multiple types of entities including users, groups, and roles. Multiple levels of groups and roles should also be supported. Users should be able to exist in multiple groups and role categories.

Security policies should be able to flexibly apply to various levels of data and functional granularity. It should be possible to associate any type of user entity with each level of data granularity.

Human Communication

Integration & Connection

Integration and connection tools serve to purposes: to get access to data from other applications, and to allow other applications to access data from this application.  General tools here include EJB calls, JMS, SOAP (and other similar protocols), general XML passing, connector packages, and in certain cases custom code.


Development Tools


Quality Assurance