Mondrian and OLAP

Mondrian is an OLAP engine written in Java. It executes queries written in the MDX language, reading data from a relational database (RDBMS), and presents the results in a multidimensional format via a Java API. Let's go into what that means.

Online Analytical Processing

Online Analytical Processing (OLAP) means analysing large quantities of data in real-time. Unlike Online Transaction Processing (OLTP), where typical operations read and modify individual and small numbers of records, OLAP deals with data in bulk, and operations are generally read-only. The term 'online' implies that even though huge quantities of data are involved � typically many millions of records, occupying several gigabytes � the system must respond to queries fast enough to allow an interactive exploration of the data. As we shall see, that presents considerable technical challenges.

OLAP employs a technique called Multidimensional Analysis. Whereas a relational database stores all data in the form of rows and columns, a multidimensional dataset consists of axes and cells. Consider the dataset

Year 2000 2001 Growth
Product Dollar sales Unit sales Dollar sales Unit sales Dollar sales Unit sales
Total $7,073 2,693 $7,636 3,008 8% 12%
� Books $2,753 824 $3,331 966 21% 17%
�� Fiction $1,341 424 $1,202 380 -10% -10%
�� Non-fiction $1,412 400 $2,129 586 51% 47%
� Magazines $2,753 824 $2,426 766 -12% -7%
— Greetings cards $1,567 1,045 $1,879 1,276 20% 22%

The rows axis consists of the members 'All products', 'Books', 'Fiction', and so forth, and the columns axis consists of the cartesian product of the years '2000' and '2001', and the calculation 'Growth', and the measures 'Unit sales' and 'Dollar sales'. Each cell represents the sales of a product category in a particular year; for example, the dollar sales of Magazines in 2001 were $2,426.

This is a richer view of the data than would be presented by a relational database. The members of a multidimensional dataset are not always values from a relational column. 'Total', 'Books' and 'Fiction' are members at successive levels in a hierarchy, each of which is rolled up to the next. And even though it is alongside the years '2000' and '2001', 'Growth' is a calculated member, which introduces a formula for computing cells from other cells.

The dimensions used here � products, time, and measures � are just three of many dimensions by which the dataset can be categorized and filtered. The collection of dimensions, hierarchies and measures is called a cube.

Conclusion

I hope I have demonstrated that multidimensional is above all a way of presenting data. Although some multidimensional databases store the data in multidimensional format, I shall argue that it is simpler to store the data in relational format.

Now it's time to look at the architecture of an OLAP system. See Mondrian architecture.



Author: Julian Hyde; last modified August 2006.
Version: $Id: //open/mondrian-release/3.0/doc/olap.html#2 $ (log)
Copyright (C) 2002-2006 Julian Hyde