Mondrian Roadmap

Introduction
Upcoming releases
Feature list
Release history

1. Introduction

This is a list of features we propose to deliver in future releases of Mondrian. Each feature is linked to a high-level description. Complex features will have more detailed specifications in a separate document.

1.1 Purpose of this document

This document has several goals. First, it lets the Mondrian community know what features we are thinking about implementing. There may be better ways of delivering the same functionality that we haven't thought of.

Second, since there is always more work than time, it allows us to prioritize. If we hear that a particular feature is important to a lot of people, we will try to get to it sooner.

Third, it allows us to attract resources. If there are features in this roadmap which are important to your organization, consider sponsoring Mondrian's development.

1.2 Mondrian's goal

Mondrian's goal is to bring multidimensional analysis to the masses.

To do this it needs to be:

free
portable
easy to install
easy to integrate, and above all
easy to understand

As an open-source olap server written in pure Java, we feel that it meets these goals. We can't anticipate all of our customers' requirements, but open-source combined with Java keeps Mondrian flexible. It's easy to add functionality or to integrate third-party tools, and Mondrian be integrated into a variety of environments.

Mondrian is part of the Pentaho Open Source BI Suite. Pentaho aims to deliver the best possible user experience by integrating Mondrian with other open-source components such as Kettle, Pentaho Reporting, and Weka. While building this integration, Pentaho is committed to keeping Mondrian independent from other components, and available under a commercial-friendly open-source license.

1.3 Scope

Mondrian can't do everything. If it did everything, it would be a huge download, difficult to install, and even more difficult to integrate with other software; and we'd never finish writing it. But the good news is, this is open source. If a feature is missing, it's often easy to add the feature to Mondrian or to integrate with another open-source product that provides the feature.

JPivot is Mondrian's sister project. It provides an excellent user-interface, and shows off what Mondrian can do. But we have been careful to keep the two projects separate. (You can use another user-interface to Mondrian, and you can also use JPivot with other data-sources.) If you've run Mondrian's demo and you have suggestions on how to improve the web interface, please make your suggestion to the JPivot project directly.

1.4 Sponsored development and co-development

Pentaho encourages companies to sponsor development of features which are important to them. Sponsorship allows Mondrian developers to spend more time to spend more time adding features to Mondrian, rather than having to find other ways to pay the rent. The results are always contributed back to the project as open-source.

Another way companies can help Mondrian is to assign employees to co-develop features. We can help specify and design these features, provided that the resulting code is contributed to the project.

If your organization would like to sponsor development of features, please contact Julian Hyde.

2. Upcoming releases

2.1 olap4j release 1.0

Targeted release timeframe: Q3 2008.

olap4j is a proposed standard API for access to any OLAP data source from Java. See www.olap4j.org.

As of mondrian-3.0 olap4j is the primary API to mondrian; mondrian's driver is based on olap4j-0.9.4 (beta). olap4j release 1.0 will be the first production release of the olap4j specification. It will include a full Test Compatibility Kit (TCK) and incorporate bug fixes & feedback from the drivers and applications built using olap4j beta.

2.2 Mondrian Release 3.1

Targeted release timeframe: Q3 2008

Feature	Effort	Importance
Remove support for old API	low	medium
3.12 Bridge to CWM. Integration with Pentaho Metadata. Could be incubator project. Note that someone has already implemented a bridge in one way.	high	high
3.10 Further work on Aggregate Tables. To support the aggregation designer, mondrian release 3.1 will probably include utilities (2) DDL generation and (3) Utility (maybe graphical, maybe text-based) to recommend a set of aggregate tables.	high	high
TBD

2.3 Aggregation Designer Release x.x

Targeted release timeframe: Q2 2008

Effort: high, Importance: high, Priority: high

Release Highlights:

Repeatable, reliable, semi-automated methodology for improving ROLAP/HOLAP performance
APIs and user interfaces that are suitable for use by developers, consultants and Pentaho customers

2.4 Schema Workbench Release x.x (cube designer)

Targeted release timeframe � not specified

Effort: high, Importance: high, Priority: high

Release Highlights:

User Interface suitable for consultants, developers and customers to design and maintain Mondrian Schemas
User Interface to support all Mondrian Schema tags and be maintained in lock-step with Mondrian server going forward

3. Feature list

3.1 Partitioned cubes

Effort: medium; importance: medium; priority: medium.

Whereas a regular cube has a single fact table, a partitioned cube has several fact tables, which are unioned together. The fact tables must have the same column names.

Each fact table can have a range (similar to 'cache ranges', above) which describes what data ranges are found in each. When looking for a particular cell, Mondrian scans the tables' criteria to determine which table to look in. For example, T1 holds data for Texas, 2005 onwards; T2 holds data for 2004 onwards; T3 holds all other data. The cell (Oklahoma, January 2005) would be found in T2.

Partitioned tables are useful for real-time analysis. For example, one partition might contain today's data, while another might hold historical data. The 'hot' partition with today's data would typically have fewer or no aggregation tables and have caching disabled; its fact table might have different physical options in the RDBMS, say fewer indexes to maximize insert performance.

Example schema:

<Cube name="Sales"> <Partitions> <Partition name="partition1" cache="false"> <Table name="sales_fact_this_month"/> <Ranges> <Range dimension="[Time]"> <RangeMember bound="lower" member="[Time].[2005].[9]"/> </Range> <Range dimension="[Store]"> <RangeMember member="[Store].[USA].[CA]"/> <RangeMember member="[Store].[USA].[WA].[Seattle]"/> </Range> </Ranges> </Partition> <Partition name="partition2" cache="true"> <Table name="sales_fact"/> <Ranges/> </Partition> </Partitions> </Cube>

3.2 Cold start

Effort: medium; importance: medium; priority: low.

When Mondrian initializes and starts to process the first queries, it makes sql calls to get member lists and determine cardinality, and then to load segments into the cache. When Mondrian is closed and restarted, it has to do that work again. This can be a significant chunk of time depending on the cube size. For example in one test an 8GB cube (55M row fact table) took 15 minutes (mostly doing a group by) before it returned results from its first query, and absent any caching on the database server would take another 15 minutes if you closed it and reopened the application. Now, this cube was just one month of data; imagine the time if there was 5 years worth.

What ideas and designs can you come up with to speed that up, in other words to do anything time consuming only once and reuse it between instances?

Gang Chen: If it's possible, can we calculate the real levels of a parent-child hierarchy? This'll let Mondrian's metadata close to MS AS's.

Julian Hyde: Can you give me more details on how that would work? Start a discussion forum or feature request on SourceForge.

Other options for cold start:

Command for mondrian to serialize cache state (definitions and data) to disk. When mondrian starts, read the cache state from disk.
Command for mondrian to serialize cache definitions to disk. When mondrian starts, reads cache definitions from disk, and cache contents from DBMS.
User writes a script of MDX commands to prime the cache. On startup, mondrian executes this script in a background thread.

3.3 Rollup in cache

Effort: medium; importance: medium; priority: low.

If the cache contains aggregates for all children of a member, then Mondrian would be able to compute the aggregate for the parent member by rolling up.

See the email thread "grouper in Mondrian".

3.4 Compound slicer

Effort: medium; importance: low; priority: low.

3.5 Schema and query validation

Process to validate a schema.

Process to validate a set of queries. Maybe an option to ignore errors due to specific members not existing because the data hasn't been loaded yet.

Expose validation via Eclipse plugin.

3.6 Name-resolution

Mondrian's name resolution is not always compatible with other MDX implementations such as MSAS and SAS.

Support abbreviated member names. For example, e.g. [Products].[Boston Lager] seems to be valid in MSAS if product names are unique, whereas Mondrian currently requires [Products].[Beverages].[Beer].[Samual Adams].[Boston Lager].
Change scheme for generating unique names, omitting the 'all' member name; current [Customers].[(All customers)].[USA] would become [Customers].[USA]. Mondrian would still understand names of the previous form.

3.7 Standard functions

Implement standard MDX functions:

DrilldownMemberBottom(<Set1>, <Set2>, <Count>[, [<Numeric Expression>][, RECURSIVE]])
DrilldownMemberTop(<Set1>, <Set2>, <Count>[, [<Numeric Expression>][, RECURSIVE]])
DrillupLevel(<Set>[, <Level>])
DrillupMember(<Set1>, <Set2>)
Except(<Set1>, <Set2>[, ALL]). (Except is implemented in Mondrian 1.2 except the ALL keyword.)
SetToArray(<Set>[, <Set>]...[, <Numeric Expression>])

3.8 Bridge to CWM

CWM (Common Warehouse Model) is a standard model for defining data warehouse and multidimensional schemas. It allows interoperability with tools such as UML diagrams, relational report design tools, and ETL tools.

This feature will add:

A gateway to present a Mondrian schema via the CWM API.
A bridge to read a CWM schema and create a Mondrian schema from it.

3.9 User-defined aggregate functions

The standard aggregate functions are sum, count, distinct-count, min, max and avg. This feature will provide an SPI by which application developers can write their own aggregate functions.

The SPI will include:

the name of the aggregate function;
parameter types;
return types;
a means to generate SQL expression to compute the aggregate from unaggregated fact table data. (For the "count" function applied to the "unit_sales" column, this would generate "count(unit_sales)".)
a means to generate SQL expression to compute the aggregate by rolling up partially aggregated data. (For the "count" function applied to the "unit_sales" column, this would generate "sum(unit_sales)". Some aggregates, such as "distinct-count", do not support rollup.)
a means to roll up values in memory. Some aggregates, such as "distinct-count", do not support this.

The SPI will support functions which map to a SQL expression rather than a SQL aggregate function. The "avg" function is an example of this: it works by expanding itself to sum / count.

The SPI will support functions which can be computed from unaggregate fact table data, but cannot be rolled up. The "distinct-count" function is an example of this.

You will be able to include user-defined aggregate functions in aggregate tables.

3.10 Further work on aggregate tables

1. Data population

Utility to populate (or generate INSERT statements to populate) the agg tables. (For extra credit: populate the tables in topological order, so that higher level aggregations can be built from lower level aggregations.)

2. DDL generation

Utility to generate a script containing CREATE TABLE and CREATE INDEX statements all possible aggregate tables (including indexes), XML for these tables, and comments indicating the estimated number of rows in these tables. Clearly this will be a huge script, and it would be ridiculous to create all of these tables. The person designing the schema could copy/paste from this file to create their own schema.

3. Utility (maybe graphical, maybe text-based) to recommend a set of aggregate tables

This is essentially an optimization algorithm, and it is described in the academic literature. Constraints on the optimization process are the amount of storage required, the estimated time to populate the agg tables. The algorithm could also take into account usage information.

4. Allow aggregate tables to be taken offline/online while Mondrian is still running

I'm thinking of these being utilities, not part of the core runtime engine. There's plenty of room to wrap these utilities in nice graphical interfaces, make them smarter.

4. Release history

4.1 Release 3.0 (2008/3/22)

olap4j API. olap4j (http://www.olap4j.org) is the Open Java API for OLAP. From mondrian-3.0 onwards, olap4j is the main API for connecting to mondrian, browsing metadata and executing queries.

Mondrian's previous API (classes in the mondrian.olap package) still exists but is deprecated; from mondrian-3.1 onwards, classes and methods in this API may not exist, may not work, or may change.
Rollup policy controls how a cell's value is calculated if some of its children are hidden by access-control. Before mondrian-3.0 the only policy was 'full': if access to a hierarchy was restricted, the value of a member would be equal to the sum of its children; from mondrian-3.0, we also allow 'partial' (the sum is the sum of the visible children) or 'hidden' (the cell's value is unknown if any of the children are hidden). The policy is expressed by the rollupPolicy attribute of the <HierarchyGrant> element.
Aggregate roles. You can now define a role in the schema that has the sum of the privileges of two or more roles; and you can connect to mondrian with one or more roles. This facility enables closer integration with Pentaho access-control, where a user can already exist in multiple roles.
Allow distinct-count measures to be aggregated. For example, mondrian can now compute the number of distinct customers who bought beer or diapers in Q2 or Q3. For efficiency, cell values are loaded in batches and a special cache allows aggregate cell values to be reused between queries.
Improved dimension sharing. Allow a shared dimension to be used more than once within the same cube.
Virtual cube enhancements. When a cube that uses the same dimension twice is involved in a virtual cube, disambiguate which usage of the dimension is involved. Allow the virtual cube to use the same cube more than once.
Scalar functions. Many scalar functions have been added in mondrian-3.0, to the the specification of the Visual Basic for Applications (VBA) and Excel libraries that are available by default in Microsoft SQL Server Analysis Services (SSAS) and that many MDX users assume are part of the core MDX language.

New functions: Abs, Acosh, Asc, AscB, AscW, Asin, Asinh, Atan2, Atanh, Atn, Cache, CBool, CByte, CDate, CDbl, Chr, ChrB, ChrW, CInt, Cos, Cosh, Date, DateAdd, DateDiff, DatePart, DateSerial, DateValue, Day, DDB, Degrees, DrilldownLevel, DrilldownLevelBottom, DrilldownLevelTop, Exp, Fix, FormatCurrency, FormatDateTime, FormatNumber, FormatPercent, FV, Hex, Hour, InStrRev, Int, IPmt, IRR, IsDate, LCase, Log, Log10, LTrim, Minute, MIRR, Month, MonthName, Now, NPer, NPV, Oct, Percentile, Pi, Pmt, Power, PPmt, PV, Radians, Rate, Replace, Right, Round, RTrim, Second, Sgn, Sin, Sinh, SLN, Space, Sqr, SqrtPi, Str, StrComp, String, StrReverse, SYD, Tan, Tanh, Time, Timer, TimeSerial, TimeValue, Trim, TypeName, Val, Weekday, WeekdayName, Year.

We have added additional forms to existing functions: Descendants(<Member>, , LEAVES); Format can now be applied to DateTime values; Iif can be applied to member, level, hierarchy, dimension and tuple and set values; Levels can be applied to a string expression.
JNDI in connect string. JDBC data sources can be specified by their JNDI name.

API changes in release 3.0

Removed methods that were deprecated in 2.4, plus:

MondrianServer.flushSchemaCache()
MondrianServer.flushDataCache()
DriverManager.getConnection(String, CatalogLocator, boolean)
DriverManager.getConnection(Util.PropertyList, boolean)
DriverManager.getConnection(Util.PropertyList, CatalogLocator, boolean)
DriverManager.getConnection(Util.PropertyList, CatalogLocator, DataSource, boolean)
RolapMember.getSqlKey()
MondrianProperties.CachePoolCostLimit (property "mondrian.rolap.CachePool.costLimit")
MondrianProperties.FlushAfterQuery (property "mondrian.rolap.RolapResult.flushAfterEachQuery")

4.2 Release 2.4 (2007/08/31)

Aggregate distinct-count measures. Mondrian now computes distinct-count measures properly over a range of selections (for example, show me a count of all new Customers from January through July).
Generate SQL with GROUPING SETS SQL construct, for databases which support it. By leveraging Grouping Sets, Mondrian can reduce the number of SQL queries necessary to fulfill an MDX request, and databases can often execute the combined queries more efficiently than the individual queries. Grouping Sets are currently supported in Oracle, DB2, Teradata and Microsoft SQL Server.
New MDX functions Extract(<Set>, <Dimension>[, <Dimension>...]), Generate, Iif(bool, bool, bool), Len, Left, Mid, UCase.
Support for Apache Commons Virtual File System (VFS) URLs.
Support keys in members, e.g. [Products].&[1234].

API changes in release 2.4

DynamicSchemaProcessor. Moved the mondrian.rolap.DynamicSchemaProcessor interface to package mondrian.spi. The processSchema(URL, PropertyList) method now has signature processSchema(String, PropertyList), and the URL is intended to be interpreted as an Apache VFS URL. Class mondrian.spi.impl.FilterDynamicSchemaProcessor is a partial implementation.
Various methods which used String or String[] to lookup multi-part identifiers such as '[Store].[USA].[CA]'
now take Id.Segment or List<Id.Segment>. The previous methods are deprecated and will be removed in mondrian-3.0 (see below).

Deprecated methods to lookup multi-part identifiers which are deprecated in mondrian-2.4 and will be removed in mondrian-3.0:

Formula.Formula(String[], exp)
Formula.Formula(String[], Exp, MemberProperty[])
QueryPart.addFormula(String[], Exp, MemberProperty[])
SchemaReader.lookupCompound(OlapElement, String[], boolean, int)
SchemaReader.getMemberByUniqueName(String[], boolean)
SchemaReader.getMemberByUniqueName(String[], boolean, MatchType)
Util.explode(String)
Util.lookupCompound(SchemaReader, OlapElement, String[], boolean, int)
Util.lookup(Query, String[])

Other deprecated methods to be removed mondrian-3.0:

Query.getQueryString()
QueryPart.toMdx()
RolapSchema.flushSchema(String, String, String, String)
RolapSchema.flushSchema(String, DataSource)
RolapSchema.clearCache()
RolapSchema.flushRolapStarCaches(boolean)
RolapSchema.flushAllRolapStarCachedAggregations()
CachePool.flush()

4.3 Release 2.3 (2007/03/12)

Cache control API.
More efficient evaluation of queries which return large results. To achieve this, some MDX functions now have multiple implementations, and can return their results as iterators in addition to the usual list format.
More control over queries which run for long periods of time, return large numbers of members or cells, or which use excessive amounts of memory. Under such conditions, queries throw particular a ResultLimitExceeded exception.
JDK 1.5 is now the primary development and delivery platform. You can continue to run mondrian on JDK 1.4 using the provided backwards-compatibility JARs mondrian-jdk14.jar and retroweaver-rt-1.2.4.jar created by retroweaver.
Added support for Ingres and LucidDB
JOLAP (JSR-069) support removed.

API changes which may impact existing applications:

Rename ResultLimitExceeded to ResultLimitExceededException;
Remove packages javax.olap, mondrian.jolap, org.omg.java.cwm;
In mondrian.olap.Axis, change 'Position[] getPositions()' to 'List<Position> getPositions()';
In mondrian.olap.Position, replace data member 'Member[] members' with methods 'Member get(int ordinal)' and 'int size()' (both inherited from List<Member>).

4.4 Release 2.2 (2006/10/??)

Mondrian-2.2 implements a host of new functions and operators: In, Matches, Cast, ValidMeasure, CurrentDateMember, CurrentDateString. Also the NULL literal.
Parameters. Formerly you could only specify parameters in a query. Now they can also be specified at system, schema or session level. Since parameters can be specified using an MDX expression, this is a great way to define constants and calculations in just one place, and share use them throughout your application.
Query timeout and cancel. We have added timeout and a cancel facility to deal with long-running queries.
There's now the ability to flush the schema cache. See mondrian.olap.MondrianServer for more details.
Internationalization just got a lot easier. Mondrian now supports a 'Locale' parameter to the connect string. Formatting information comes from Java rather than from MondrianResource.properties, which means that Mondrian should work out of the box for any locale Java supports.
Performance improvements. The Level.approxRowCount schema attribute saves mondrian the effort of executing queries to count levels solely for XML/A's purposes. There are also performance improvments in the LastNonEmpty function, and crossjoin can be evaluated in SQL even for virtual cubes.
Lastly, we moved mondrian's website to http://mondrian.pentaho.org. Same content as before, but better formatted, and more integrated with the rest of the Pentaho family of projects.

4.5 Release 2.1 (2006/04/01)

Finally, a separate distribution mondrian-*-embedded.zip, including an embedded Derby database in the WAR. This can be deployed to Tomcat on any platform by simply exploding the WAR into TOMCAT/webapps, allowing folks "kicking the tires" to easily try out Mondrian/JPivot. See how to deploy and run the embedded web app.
XML/A bug fixes, functionality and test suite improvements.
Compilation of MDX expressions. This is an architectural change to allow Mondrian to analyze queries at the start of execution, and trade off various techniques such as expression-caching and pushing predicates into the generated SQL. It involves some API changes (see below).
Allow distinct-count measures to be rolled up over attributes which are functionally dependent on the key of the measure (e.g. "gender" is functionally dependent on the key "customer_id" of the measure "Customer Count"). This yields performance improvements when using distinct count-aggregates.
Improved integration of User-Defined Functions.
Implemented VisualTotals, LastPeriods, AddCalculatedMembers, StripCalculatedMembers MDX functions.
Support for comments in MDX (/* ... */, -- [rest of line], // [rest of line]).
Includes recent, compatible version of JPivot.
Interbase 6 support.
Many bug fixes and extensions to the test suite.
Documentation improvements.

4.5.1 API changes in release 2.1

FunCall and UnresolvedFunCall. It used to be possible to create a FunCall with the name of a function but no function definition. This complicated the validation process, because we would discover at runtime that a function call had no definition. Now you should use the new class UnresolvedFunCall.
Category methods. Renaming a few of the methods concerning types and categories.
- Exp.getType() used to return int, now returns Type
- Old usages of Exp.getType() should use Exp.getCategory()
- int[] FunDef.getParameterTypes() is renamed to int[] FunCall.getParameterCategories()
- int FunCall.getReturnType() is renamed to int FunCall.getReturnCategory()
- Removed the Exp.getTypeX() method; old usages of this method should now use Exp.getType().
OLAP element types. OLAP elements Cube, Dimension, Hierarchy, Level and Member no longer implement the Exp interface. If you want to use these in expressions, there are wrapper classes: DimensionExpr, HierarchyExpr, LevelExpr, MemberExpr. These are in a new package, mondrian.mdx. Some other parse tree classes (Query, Literal) will move to this package at some time in the future.

4.6 Release 2.0 (2005/12/19)

Aggregate tables.
Calculated sets defined in the schema, and WITH SET syntax to define sets within an MDX query.
Cached set expressions. The WITH SET feature and functions such as RANK cause the same expression to be evaluated many times within the course of a single MDX statement. The set-expression cache improves the performance of such queries.
User-defined functions.
Enhanced support for parent-child hierarchies.
Enhanced XML for Analysis (XML/A) support.
Enhanced support for internationalized/localized (I18N/L10N) applications.
Pushdown SQL. To improve performance, Mondrian automatically translates filters and aggregations into SQL which can be executed on the underlying RDBMS.
Support for the Apache Derby pure Java embedded RDBMS.

4.7 Release 1.1 (2005/04/06)

Numerous improvements in functionality, performance, and stability.

4.8 Release 1.0 (2003/08/18)

First production release.
Distinct-count aggregations.
JDBC connection-pooling.
Support for XML for Analysis (XML/A).

4.9 Release 0.6 (2003/05/24)

Parent-child hierarchies.
Partial support for XML for Analysis (XML/A).

4.10 Release 0.5 (2003/02/20)

Partial support for JOLAP.
Implement Hierarchize, ":", Aggregate, and statistical functions.

4.11 Release 0.4 (2002/11/10)

Integration with JPivot.
Improved thread-safety.

4.12 Release 0.3 (2002/08/09)

First public release.
JSP page generates a static table from an MDX query.
Support for several SQL dialects.

Author: Julian Hyde; last modified February 2008.
Version: $Id: //open/mondrian-release/3.0/doc/roadmap.html#2 $ (log)
Copyright (C) 2002-2008 Julian Hyde and others

Contents