Pentaho Analysis Services: How to Design a Mondrian Schema

Mondrian CmdRunner

What is CmdRunner?
Building
Usage
Properties File
Command line arguments
CmdRunner Commands
- help
- set
- log
- file
- list
- func
- param
- cube
- error
- echo
- expr
- =
- ~
- exit
- run an MDX query
AggGen: Aggregate SQL Generator

What is CmdRunner?

CmdRunner is a command line interpreter for Mondrian. From within the command interpreter or in a command file: properties can be set and values displayed, logging levels changed, built-in function usages displayed, parameter values displayed and set, per-cube attributes displayed and set, results and errors from the previous MDX command displayed and, of course, MDX queries evaluated.

For Mondrian developers new features can be quickly tested with CmdRunner. As an example, to test a new user-defined function all one need to is add it to the schema, add the location of the function's java class to the class path, point CmdRunner at the schema and execute a MDX query that uses the new function.

For MDX developers, CmdRunner lets one test a new MDX query or Mondrian schema without having to run Mondrian in a Webserver using JPivot. Rather, one can have the new MDX query in a file and point CmdRunner at it. Granted, the output is a list, possibly long, of row and column entries; but sometimes all one needs from CmdRunner is to know that the query runs and other times one can always post process the output into excel or gnuplot, etc.

Building

There are two ways to run the command interpreter. The first is to have a script create a class path with all of the needed mondrian and support jars in it and then have java execute the CmdRunner main method. The second is to build a jar that contains all of the needed classes and simply have java reference the jar using the -jar argument.

To build the CmdRunner combined jar from the shell command line execute the following build command:

mondrian> ./build.sh cmdrunner

This will create the jar cmdrunner.jar in the MONDRIAN_HOME/lib directory. For this build to create a jar that can actually be used it is important that the JDBC jar for your database be placed in the MONDRIAN_HOME/testlib directory prior to executing the build command.

What is useful about the cmdrunner.jar is that it can be executed without having to have the MONDRIAN_HOME directory around since it bundles up everything that is needed (other than the properties and schema files).

Usage

There are two ways to invoke CmdRunner: using the cmdrunner.jar or using a script that builds a class path of the required jars and then executes java with that class path. The former is an easy "canned" solution but requires building the cmdrunner.jar while the later is quicker if you are in a code, compile and test cycle.

To run CmdRunner using the cmdrunner.jar from the shell prompt execute:

somedir> java -jar cmdrunner.jar -p foodmart.properties

In the MONDRIAN_HOME/bin directory there are the shell scripts cmdrunner.sh and cmdrunner.cmd that can be used duplicating the above command:

mondrian> ./bin/cmdrunner.sh -p foodmart.properties

To run CmdRunner without first building the cmdrunner.jar there is the run.sh in the MONDRIAN_HOME/bin directory. This script creates a class path and includes all jars in the MONDRIAN_HOME/testlib directory where the jdbc jars are located.

mondrian> ./bin/run.sh -p foodmart.properties

Properties File

Below is an example properties file:

############################################################################## # # Example properties file # # $Id: //open/mondrian-release/3.0/doc/cmdrunner.html#2 $ ############################################################################## # Environment mondrian.catalogURL=file:///home/madonna/mondrian/FoodMartSchema.xml # mysql mondrian.test.jdbcURL=jdbc:mysql://localhost/foodmart?user=foodmart&password=foodmart # to specify the jdbc username and password: # mondrian.test.jdbcUser=foodmart # mondrian.test.jdbcPassword=foodmart mondrian.jdbcDrivers=com.mysql.jdbc.Driver # Use MD5 based caching for the RolapSchema instance mondrian.catalog.content.cache.enabled=true # both read and use aggregate tables mondrian.rolap.aggregates.Use=true mondrian.rolap.aggregates.Read=true # generate aggregate sql (for every mdx query) #mondrian.rolap.aggregates.generateSql=true # pretty print sql (if log level for mondrian.rolap.RolapUtil is DEBUG) mondrian.rolap.generate.formatted.sql=true # by default the aggregate table with the smallest number of rows # (rather than rows times size of each row) is used #mondrian.rolap.aggregates.ChooseByVolume=true

Command line arguments

CmdRunner has the following command line options:

Option
Description

-h Print help, the list of command line options.

-d Enable CmdRunner debugging. This does not change this log level.

-t Time each mdx query's execution.

-nocache Regardless of the settings in the Schema file, set each Cube to no in-memory aggregate caching (caching is turned off so each query goes to the database).

-rc Do not reload the connection after each query (the default is to reload the connection. Its safe to just ignore this.

-p property-file Specify the Mondrian property file. This argument is basically required for any but the most trivial command interpreter commands. To execute a MDX query or request information about a function, the property file must be supplied. On the other hand, to have the CmdRunner print out its internal help, then the property file is not needed.

-f filename+ Specify the name of one or more files that contains CmdRunner commands. If this argument is not supplied, then the interpreter starting in the command entry mode. After the -f is seen, all subsequent arguments are interpreted as filenames.

-x xmla_filename+ Specify the name of one or more files that contains XMAL request that has no SOAP wrapper. After the -x is seen, all subsequent arguments are interpreted as XMLA filenames.

-xs soap_xmla_filename+ Specify the name of one or more files that contains XMAL request with a SOAP wrapper. After the -xs is seen, all subsequent arguments are interpreted as SOAP XMLA filenames.

-vt Validate the XMLA response using XSLT transform. This can only be used with the -x or -xs flags.

-vx Validate the XMLA response using XPaths. This can only be used with the -x or -xs flags.

mdx_command A string representing one or more CmdRunner commands.

CmdRunner Commands

The command interpreter has a fixed set of built in commands. When a line is read, if the first word of the line matches one of the commands, then the rest of the line is assumed to be arguments to that command. On the other hand, if the first word does not match a built in command, then all text until a ';' is seen or until a '=' is entered by itself on a command continuation line is seen will be passed to the Mondrian query engine.

help

> help <cr>
Prints help for all commands.

set

> set [ property[=value ] ] <cr>
With no args, prints all mondrian properties and values.

With "property" prints property's value.

With "property=value" set property to that value.

log

> log [ classname[=level ] ] <cr>
With no args, prints the current log level of all classes.

With "classname" prints the current log level of the class.

With "classname=level" set log level to new value.

file

> file [ filename | '=' ] <cr>
With no args, prints the last filename executed.

With "filename", read and execute filename.

With "=" character, re-read and re-execute previous filename.

list

> list [ cmd | result ] <cr>
With no arguments, list previous cmd and result

With "cmd" argument, list the last mdx query cmd.

With "result" argument, list the last mdx query result.

func

> func [ name ] <cr>
With no arguments, list all defined function names.

With "name" argument, display the functions: name, description, and syntax.

param

> param [ name[=value ] ] <cr>
With no arguments, all param name/value pairs are printed.

With "name" argument, the value of the param is printed.

With "name=value" sets the parameter with name to value. If name is null, then unsets all parameters. If value is null, then unsets the parameter associated with value.

cube

> cube [ cubename [ name [=value | command] ] ] <cr>
With no arguments, all cubes are listed by name.

With "cubename" argument, cube attribute name/values for: fact table (readonly) aggregate caching (readwrite) are printed.

With "cubename name=value", sets the readwrite attribute with name to value.

With "cubename command", executes the commands: clearCache.

error

> error [ msg | stack ] <cr>
With no arguments, both message and stack are printed.

With "msg" argument, the Error message is printed.

With "stack" argument, the Error stack trace is printed.

echo

> echo text <cr>
Prints text to standard out.

expr

> expr cubename expression <cr>
Evaluates an expression against a cube

=

> = <cr>
Re-executes previous MDX query.

~

> ~ <cr>
Clears any text entered so far for the current command.

exit

> exit <cr>
Exits the MDX command interpreter.

run an MDX query

> <mdx query> ( [ ';' ] <cr> | <cr> ( '=' | '~' ) <cr> )
Executes or cancels an MDX query.

An MDX query may span one or more lines. The continuation prompt is a '?'.

After the last line of the query has been entered, on the next line a single execute character, '=', may be entered followed by a carriage return. The lone '=' informs the interpreter that the query has has been entered and is ready to execute.

At anytime during the entry of a query the cancel character, '~', may be entered alone on a line. This removes all of the query text from the the command interpreter.

Queries can also be ended by using a semicolon ';' at the end of a line.

During general operation, Mondrian Property triggers are disabled. If you enable Mondrian Property triggers for a CmdRunner session, either in the property file read on starup or by explicitly using the set property command

> set mondrian.olap.triggers.enable=true <cr>

then one can force a re-scanning of the database for aggregate tables by disabling and then re-enabling the use of aggregates:

> set mondrian.olap.aggregates.Read=false <cr> > set mondrian.olap.aggregates.Read=true <cr>

In fact, as long as one does not use the -rc command line argument so that a new connection is gotten every time a query is executed, one can edit the Mondrian schema file between MDX query execute. This allows one to not only change what aggregates tables are in seen by Mondrian but also the definitions of the cubes within a given CmdRunner session.

Similarly, one can change between aggregate table partial ordering algorithm by changing the value of the associated property, mondrian.olap.aggregates.ChooseByVolume thus triggering internal code to reorder the aggregate table lookup order.

Within the command interpreter there is no ability to edit a previously entered MDX query. If you wish to iteratively edit and run a MDX query, put the query in a file, tell the CmdRunner to execute the file using the file command, re-execute the file using the = command, and in separate window edit/save MDX in the file.

There is also no support for a command history (other than the '=' command).

AggGen: Aggregate SQL Generator

Mondrian release 1.2 introduces Aggregate Tables as a means of improving performance, but aggregate tables are difficult to use without tools to support them.

CmdRunner includes a utility called AggGen, the Aggregate Table Generator. With it, you can issue an MDX query, and generate a script to create and populate the appropriate aggregate tables to support that MDX query. (The query does not actually return a result.)

In the property file provided to the CmdRunner at startup add the line:

mondrian.rolap.aggregates.generateSql=true

or from the CmdRunner command line enter:

> set mondrian.rolap.aggregates.generateSql=true <cr>

This instructs Mondrian whenever an MDX query is executed (and the cube associated with the query is not virtual) to output to standard out the Sql associated with the creation and population of both the "lost" dimension aggregate table and the "collapsed" dimension aggregate table which would be best suited to optimize the given MDX query. This Sql has to be edited to change the "l_XXX" in the "lost" dimension statements or "c_XXX" in the "collapsed" dimension statements to more appropriate table names (remembering to make sure that the new names can still be recognized by Mondrian as aggregates of the particular fact table).

As an example, if the following MDX is run against a MySql system:

WITH MEMBER
    [Store].[Nat'l Avg] AS
   'AVG(  { [Store].[Store Country].Members}, [Measures].[Units Shipped])'
SELECT
    { [Store].[Store Country].Members, [Store].[Nat'l Avg] } ON COLUMNS,
    { [Product].[Product Family].[Non-Consumable].Children } ON ROWS
FROM
   [Warehouse]
WHERE 
    [Measures].[Units Shipped];

Then the following is written to standard output:

WARN  [main] AggGen For RolapStar: "inventory_fact_1997" measure with
name, "warehouse_sales"-"inventory_fact_1997"."warehouse_cost", is not a column
name. The measure's column name may be an expression and currently AggGen does
not handle expressions. You will have to add this measure to the aggregate table
definition by hand.

CREATE TABLE agg_l_XXX_inventory_fact_1997 (
    time_id INT,
    product_id INT NOT NULL,
    store_id INT,
    store_invoice DECIMAL(10,4),
    supply_time SMALLINT,
    warehouse_cost DECIMAL(10,4),
    warehouse_sales DECIMAL(10,4),
    units_shipped INT,
    units_ordered INT,
    fact_count INTEGER NOT NULL);

INSERT INTO agg_l_XXX_inventory_fact_1997 (
    time_id,
    product_id,
    store_id,
    store_invoice,
    supply_time,
    warehouse_cost,
    warehouse_sales,
    units_shipped,
    units_ordered,
    fact_count)
SELECT
    `inventory_fact_1997`.`time_id` AS `time_id`,
    `inventory_fact_1997`.`product_id` AS `product_id`,
    `inventory_fact_1997`.`store_id` AS `store_id`,
    SUM(`inventory_fact_1997`.`store_invoice`) AS `store_invoice`,
    SUM(`inventory_fact_1997`.`supply_time`) AS `supply_time`,
    SUM(`inventory_fact_1997`.`warehouse_cost`) AS `warehouse_cost`,
    SUM(`inventory_fact_1997`.`warehouse_sales`) AS `warehouse_sales`,
    SUM(`inventory_fact_1997`.`units_shipped`) AS `units_shipped`,
    SUM(`inventory_fact_1997`.`units_ordered`) AS `units_ordered`,
    COUNT(*) AS `fact_count`
FROM 
    `inventory_fact_1997` AS `inventory_fact_1997`
GROUP BY 
    `inventory_fact_1997`.`time_id`,
    `inventory_fact_1997`.`product_id`,
    `inventory_fact_1997`.`store_id`;

CREATE TABLE agg_c_XXX_inventory_fact_1997 (
    product_family VARCHAR(30),
    product_department VARCHAR(30),
    store_country VARCHAR(30),
    the_year SMALLINT,
    store_invoice DECIMAL(10,4),
    supply_time SMALLINT,
    warehouse_cost DECIMAL(10,4),
    warehouse_sales DECIMAL(10,4),
    units_shipped INT,
    units_ordered INT,
    fact_count INTEGER NOT NULL);

INSERT INTO agg_c_XXX_inventory_fact_1997 (
    product_family,
    product_department,
    store_country,
    the_year,
    store_invoice,
    supply_time,
    warehouse_cost,
    warehouse_sales,
    units_shipped,
    units_ordered,
    fact_count)
SELECT
    `product_class`.`product_family` AS `product_family`,
    `product_class`.`product_department` AS `product_department`,
    `store`.`store_country` AS `store_country`,
    `time_by_day`.`the_year` AS `the_year`,
    SUM(`inventory_fact_1997`.`store_invoice`) AS `store_invoice`,
    SUM(`inventory_fact_1997`.`supply_time`) AS `supply_time`,
    SUM(`inventory_fact_1997`.`warehouse_cost`) AS `warehouse_cost`,
    SUM(`inventory_fact_1997`.`warehouse_sales`) AS `warehouse_sales`,
    SUM(`inventory_fact_1997`.`units_shipped`) AS `units_shipped`,
    SUM(`inventory_fact_1997`.`units_ordered`) AS `units_ordered`,
    COUNT(*) AS `fact_count`
FROM 
    `inventory_fact_1997` AS `inventory_fact_1997`,
    `product_class` AS `product_class`,
    `product` AS `product`,
    `store` AS `store`,
    `time_by_day` AS `time_by_day`
WHERE 
    `product`.`product_class_id` = `product_class`.`product_class_id` and
    `inventory_fact_1997`.`product_id` = `product`.`product_id` and
    `inventory_fact_1997`.`store_id` = `store`.`store_id` and
    `inventory_fact_1997`.`time_id` = `time_by_day`.`time_id`
GROUP BY 
    `product_class`.`product_family`,
    `product_class`.`product_department`,
    `store`.`store_country`,
    `time_by_day`.`the_year`;

There are a couple of things to notice about the output.

First, is the WARN log message. This appears because the inventory_fact_1997 table has a measure with a column attribute "warehouse_sales"-"inventory_fact_1997"."warehouse_cost" that is not a column name, its an expression. The AggGen code does not currently know what to do with such an expression, so it issues a warning. A user would have to take the generated aggregate table Sql scripts and alter them to accommodate this measure.

There are two aggregate tables, agg_l_XXX_inventory_fact_1997 the "lost" dimension case and agg_c_XXX_inventory_fact_1997 the "collapsed" dimension case. The "lost" dimension table, keeps the foreign keys for those dimension used by the MDX query and discards the other foreign keys, while the "collapsed" dimension table also discards the foreign keys that are not needed but, in addition, rolls up or collapses the remaining dimensions to just those levels needed by the query.

There are no indexes creation Sql statements for the aggregate tables. This is because not all databases require indexes to achive good performance against star schemas - your mileage may vary so do some testing. (With MySql indexes are a good idea).

If one is creating a set of aggregate tables, there are cases where it is more efficient to create the set of aggregates that are just above the fact tables and then create each subsequent level of aggregates from one of the preceeding aggregate tables rather than always going back to the fact table.

There are many possible aggregate tables for a given set of fact tables. AggGen just provides example Sql scripts based upon the MDX query run. Judgement has to be used when creating aggregate tables. There are tradeoffs such as which are the MDX queries that are run the most often? How much space does each aggregate table take? How long does it take to create the aggregate tables? How often does the set of MDX queries change? etc.

During normal Mondrian operation, for instance, with JPivot, it is recommended that the above AggGen property not be set to true as it will slow down Mondrian and generate a lot of text in the log file.

Author: Richard Emberson; last updated July, 2005.
Version: $Id: //open/mondrian-release/3.0/doc/cmdrunner.html#2 $ (log )
Copyright (C) 2005-2007 Julian Hyde and others

Option	Description
`-h`	Print help, the list of command line options.
`-d`	Enable CmdRunner debugging. This does not change this log level.
`-t`	Time each mdx query's execution.
`-nocache`	Regardless of the settings in the Schema file, set each Cube to no in-memory aggregate caching (caching is turned off so each query goes to the database).
`-rc`	Do not reload the connection after each query (the default is to reload the connection. Its safe to just ignore this.
`-p property-file`	Specify the Mondrian property file. This argument is basically required for any but the most trivial command interpreter commands. To execute a MDX query or request information about a function, the property file must be supplied. On the other hand, to have the CmdRunner print out its internal help, then the property file is not needed.
`-f filename+`	Specify the name of one or more files that contains CmdRunner commands. If this argument is not supplied, then the interpreter starting in the command entry mode. After the `-f` is seen, all subsequent arguments are interpreted as filenames.
`-x xmla_filename+`	Specify the name of one or more files that contains XMAL request that has no SOAP wrapper. After the `-x` is seen, all subsequent arguments are interpreted as XMLA filenames.
`-xs soap_xmla_filename+`	Specify the name of one or more files that contains XMAL request with a SOAP wrapper. After the `-xs` is seen, all subsequent arguments are interpreted as SOAP XMLA filenames.
`-vt`	Validate the XMLA response using XSLT transform. This can only be used with the `-x` or `-xs` flags.
`-vx`	Validate the XMLA response using XPaths. This can only be used with the `-x` or `-xs` flags.
`mdx_command`	A string representing one or more CmdRunner commands.

Contents