Mondrian Roadmap

Contents

  1. Introduction
    1. Purpose of this document
    2. Mondrian's goals
    3. Scope
    4. Sponsored development and co-development
  2. Upcoming releases
    1. olap4j release 1.0
    2. Mondrian release 3.1
    3. Aggregation designer release x.x
    4. Schema workbench release x.x
  3. Feature list
    1. Partitioned cubes
    2. Cold start
    3. Rollup in cache
    4. Compound slicer
    5. Schema and query validation
    6. Name-resolution
    7. Standard functions
    8. Bridge to CWM
    9. User-defined aggregate functions
    10. Further work on aggregate tables
  4. Release history
    1. Release 3.0
    2. Release 2.4
    3. Release 2.3
    4. Release 2.2
    5. Release 2.1
    6. Release 2.0
    7. Release 1.1
    8. Release 1.0
    9. Release 0.6
    10. Release 0.5
    11. Release 0.4
    12. Release 0.3

1. Introduction 

This is a list of features we propose to deliver in future releases of Mondrian. Each feature is linked to a high-level description. Complex features will have more detailed specifications in a separate document.

1.1 Purpose of this document 

This document has several goals. First, it lets the Mondrian community know what features we are thinking about implementing. There may be better ways of delivering the same functionality that we haven't thought of.

Second, since there is always more work than time, it allows us to prioritize. If we hear that a particular feature is important to a lot of people, we will try to get to it sooner.

Third, it allows us to attract resources. If there are features in this roadmap which are important to your organization, consider sponsoring Mondrian's development.

1.2 Mondrian's goal 

Mondrian's goal is to bring multidimensional analysis to the masses.

To do this it needs to be:

As an open-source olap server written in pure Java, we feel that it meets these goals. We can't anticipate all of our customers' requirements, but open-source combined with Java keeps Mondrian flexible. It's easy to add functionality or to integrate third-party tools, and Mondrian be integrated into a variety of environments.

Mondrian is part of the Pentaho Open Source BI Suite. Pentaho aims to deliver the best possible user experience by integrating Mondrian with other open-source components such as Kettle, Pentaho Reporting, and Weka. While building this integration, Pentaho is committed to keeping Mondrian independent from other components, and available under a commercial-friendly open-source license.

1.3 Scope 

Mondrian can't do everything. If it did everything, it would be a huge download, difficult to install, and even more difficult to integrate with other software; and we'd never finish writing it. But the good news is, this is open source. If a feature is missing, it's often easy to add the feature to Mondrian or to integrate with another open-source product that provides the feature.

JPivot is Mondrian's sister project. It provides an excellent user-interface, and shows off what Mondrian can do. But we have been careful to keep the two projects separate. (You can use another user-interface to Mondrian, and you can also use JPivot with other data-sources.) If you've run Mondrian's demo and you have suggestions on how to improve the web interface, please make your suggestion to the JPivot project directly.

1.4 Sponsored development and co-development 

Pentaho encourages companies to sponsor development of features which are important to them. Sponsorship allows Mondrian developers to spend more time to spend more time adding features to Mondrian, rather than having to find other ways to pay the rent. The results are always contributed back to the project as open-source.

Another way companies can help Mondrian is to assign employees to co-develop features. We can help specify and design these features, provided that the resulting code is contributed to the project.

If your organization would like to sponsor development of features, please contact Julian Hyde.

2. Upcoming releases 

2.1 olap4j release 1.0 

Targeted release timeframe: Q3 2008.

olap4j is a proposed standard API for access to any OLAP data source from Java. See www.olap4j.org.

As of mondrian-3.0 olap4j is the primary API to mondrian; mondrian's driver is based on olap4j-0.9.4 (beta). olap4j release 1.0 will be the first production release of the olap4j specification. It will include a full Test Compatibility Kit (TCK) and incorporate bug fixes & feedback from the drivers and applications built using olap4j beta.

2.2 Mondrian Release 3.1 

Targeted release timeframe: Q3 2008

Feature Effort Importance
Remove support for old API low medium
3.12 Bridge to CWM. Integration with Pentaho Metadata. Could be incubator project. Note that someone has already implemented a bridge in one way. high high
3.10 Further work on Aggregate Tables. To support the aggregation designer, mondrian release 3.1 will probably include utilities (2) DDL generation and (3) Utility (maybe graphical, maybe text-based) to recommend a set of aggregate tables. high high
TBD    
 

2.3 Aggregation Designer Release x.x 

Targeted release timeframe: Q2 2008

Effort: high, Importance: high, Priority: high

Release Highlights:

2.4 Schema Workbench Release x.x (cube designer) 

Targeted release timeframe � not specified

Effort: high, Importance: high, Priority: high

Release Highlights:

3. Feature list 

3.1 Partitioned cubes 

Effort: medium; importance: medium; priority: medium.

Whereas a regular cube has a single fact table, a partitioned cube has several fact tables, which are unioned together. The fact tables must have the same column names.

Each fact table can have a range (similar to 'cache ranges', above) which describes what data ranges are found in each. When looking for a particular cell, Mondrian scans the tables' criteria to determine which table to look in. For example, T1 holds data for Texas, 2005 onwards; T2 holds data for 2004 onwards; T3 holds all other data.  The cell (Oklahoma, January 2005) would be found in T2.

Partitioned tables are useful for real-time analysis. For example, one partition might contain today's data, while another might hold historical data. The 'hot' partition with today's data would typically have fewer or no aggregation tables and have caching disabled; its fact table might have different physical options in the RDBMS, say fewer indexes to maximize insert performance.

Example schema:

<Cube name="Sales">
    <Partitions>
        <Partition name="partition1" cache="false">
            <Table name="sales_fact_this_month"/>
            <Ranges>
                <Range dimension="[Time]">
                    <RangeMember bound="lower" member="[Time].[2005].[9]"/>
                </Range>
                <Range dimension="[Store]">
                    <RangeMember member="[Store].[USA].[CA]"/>
                    <RangeMember member="[Store].[USA].[WA].[Seattle]"/>
                </Range>
            </Ranges>
        </Partition>
        <Partition name="partition2" cache="true">
            <Table name="sales_fact"/>
            <Ranges/>
        </Partition>
    </Partitions>
</Cube>

3.2 Cold start 

Effort: medium; importance: medium; priority: low.

When Mondrian initializes and starts to process the first queries, it makes sql calls to get member lists and determine cardinality, and then to load segments into the cache. When Mondrian is closed and restarted, it has to do that work again. This can be a significant chunk of time depending on the cube size. For example in one test an 8GB cube (55M row fact table) took 15 minutes (mostly doing a group by) before it returned results from its first query, and absent any caching on the database server would take another 15 minutes if you closed it and reopened the application. Now, this cube was just one month of data; imagine the time if there was 5 years worth.

What ideas and designs can you come up with to speed that up, in other words to do anything time consuming only once and reuse it between instances?

Gang Chen: If it's possible, can we calculate the real levels of a parent-child hierarchy? This'll let Mondrian's metadata close to MS AS's.

Julian Hyde: Can you give me more details on how that would work? Start a discussion forum or feature request on SourceForge.

Other options for cold start:

3.3 Rollup in cache 

Effort: medium; importance: medium; priority: low.

If the cache contains aggregates for all children of a member, then Mondrian would be able to compute the aggregate for the parent member by rolling up.

See the email thread "grouper in Mondrian".

3.4 Compound slicer 

Effort: medium; importance: low; priority: low.

3.5 Schema and query validation 

Process to validate a schema.

Process to validate a set of queries. Maybe an option to ignore errors due to specific members not existing because the data hasn't been loaded yet.

Expose validation via Eclipse plugin.

3.6 Name-resolution 

Mondrian's name resolution is not always compatible with other MDX implementations such as MSAS and SAS.

  1. Support abbreviated member names. For example, e.g. [Products].[Boston Lager] seems to be valid in MSAS if product names are unique, whereas Mondrian currently requires [Products].[Beverages].[Beer].[Samual Adams].[Boston Lager].
  2. Change scheme for generating unique names, omitting the 'all' member name; current [Customers].[(All customers)].[USA] would become [Customers].[USA]. Mondrian would still understand names of the previous form.

3.7 Standard functions 

Implement standard MDX functions:

3.8 Bridge to CWM 

CWM (Common Warehouse Model) is a standard model for defining data warehouse and multidimensional schemas. It allows interoperability with tools such as UML diagrams, relational report design tools, and ETL tools.

This feature will add:

3.9 User-defined aggregate functions 

The standard aggregate functions are sum, count, distinct-count, min, max and avg. This feature will provide an SPI by which application developers can write their own aggregate functions.

The SPI will include:

The SPI will support functions which map to a SQL expression rather than a SQL aggregate function. The "avg" function is an example of this: it works by expanding itself to sum / count.

The SPI will support functions which can be computed from unaggregate fact table data, but cannot be rolled up. The "distinct-count" function is an example of this.

You will be able to include user-defined aggregate functions in aggregate tables.

3.10 Further work on aggregate tables 

1. Data population

Utility to populate (or generate INSERT statements to populate) the agg tables. (For extra credit: populate the tables in topological order, so that higher level aggregations can be built from lower level aggregations.)

2. DDL generation

Utility to generate a script containing CREATE TABLE and CREATE INDEX statements all possible aggregate tables (including indexes), XML for these tables, and comments indicating the estimated number of rows in these tables. Clearly this will be a huge script, and it would be ridiculous to create all of these tables. The person designing the schema could copy/paste from this file to create their own schema.

3. Utility (maybe graphical, maybe text-based) to recommend a set of aggregate tables

This is essentially an optimization algorithm, and it is described in the academic literature. Constraints on the optimization process are the amount of storage required, the estimated time to populate the agg tables. The algorithm could also take into account usage information.

4. Allow aggregate tables to be taken offline/online while Mondrian is still running

I'm thinking of these being utilities, not part of the core runtime engine. There's plenty of room to wrap these utilities in nice graphical interfaces, make them smarter.

4. Release history 

4.1 Release 3.0 (2008/3/22) 

API changes in release 3.0

Removed methods that were deprecated in 2.4, plus:

4.2 Release 2.4 (2007/08/31) 

API changes in release 2.4

Deprecated methods to lookup multi-part identifiers which are deprecated in mondrian-2.4 and will be removed in mondrian-3.0:

Other deprecated methods to be removed mondrian-3.0:

4.3 Release 2.3 (2007/03/12) 

API changes which may impact existing applications:

4.4 Release 2.2 (2006/10/??) 

4.5 Release 2.1 (2006/04/01) 

  1. Finally, a separate distribution mondrian-*-embedded.zip, including an embedded Derby database in the WAR. This can be deployed to Tomcat on any platform by simply exploding the WAR into TOMCAT/webapps, allowing folks "kicking the tires" to easily try out Mondrian/JPivot. See how to deploy and run the embedded web app.
  2. XML/A bug fixes, functionality and test suite improvements.
  3. Compilation of MDX expressions. This is an architectural change to allow Mondrian to analyze queries at the start of execution, and trade off various techniques such as expression-caching and pushing predicates into the generated SQL. It involves some API changes (see below).
  4. Allow distinct-count measures to be rolled up over attributes which are functionally dependent on the key of the measure (e.g. "gender" is functionally dependent on the key "customer_id" of the measure "Customer Count"). This yields performance improvements when using distinct count-aggregates.
  5. Improved integration of User-Defined Functions.
  6. Implemented VisualTotals, LastPeriods, AddCalculatedMembers, StripCalculatedMembers MDX functions.
  7. Support for comments in MDX (/* ... */, -- [rest of line], // [rest of line]).
  8. Includes recent, compatible version of JPivot.
  9. Interbase 6 support.
  10. Many bug fixes and extensions to the test suite.
  11. Documentation improvements.

4.5.1 API changes in release 2.1 

  1. FunCall and UnresolvedFunCall. It used to be possible to create a FunCall with the name of a function but no function definition. This complicated the validation process, because we would discover at runtime that a function call had no definition. Now you should use the new class UnresolvedFunCall.
  2. Category methods. Renaming a few of the methods concerning types and categories.
  3. OLAP element types. OLAP elements Cube, Dimension, Hierarchy, Level and Member no longer implement the Exp interface. If you want to use these in expressions, there are wrapper classes: DimensionExpr, HierarchyExpr, LevelExpr, MemberExpr. These are in a new package, mondrian.mdx. Some other parse tree classes (Query, Literal) will move to this package at some time in the future.

4.6 Release 2.0 (2005/12/19) 

4.7 Release 1.1 (2005/04/06) 

4.8 Release 1.0 (2003/08/18) 

4.9 Release 0.6 (2003/05/24) 

4.10 Release 0.5 (2003/02/20) 

4.11 Release 0.4 (2002/11/10) 

4.12 Release 0.3 (2002/08/09) 


Author: Julian Hyde; last modified February 2008.
Version: $Id: //open/mondrian-release/3.0/doc/roadmap.html#2 $ (log)
Copyright (C) 2002-2008 Julian Hyde and others