$Revision: 3643 $
$Date: 2010-06-06 23:04:17 -0400 (Sun, 06 Jun 2010) $
Table of Contents
SQL-invoked routines are functions and procedures called from SQL. HyperSQL 2.0 supports routines conforming to two parts of the SQL Standard. Routines written in the SQL language are supported in conformance to SQL/PSM (Persistent Stored Modules) specification. Routines written in Java are supported in (loose) conformance to SQL/JRT specification. In addition, HyperSQL’s previous non-standard support for calling Java routines without prior method definition is retained and enhanced in the latest version by extending the SQL/JRT specification.
HyperSQL also supports user defined aggregate functions written in the SQL language. This feature is an extension to the SQL Standard.
SQL-invoked routines are schema objects. Naming and referencing follows conventions common to all schema objects. The same routine name can be defined in two different schemas and used with schema-qualified references.
A routine is either a procedure or a function.
A function:
is defined with CREATE FUNCTION
always returns a value
does not modify the data in the database
is called as part of an SQL statement
can have parameters
can be polymorphic
A procedure:
is defined with CREATE PROCEDURE
can return one or more values
can modify the data in the database
is called separately, using the CALL statement
can have parameters
can be polymorphic
Definition of routine signature and characteristics, name resolution and invocation are all implemented uniformly for routines written in SQL or Java.
The PSM (Persistent Stored Module) specification extends the SQL language to allow definition of both SQL Function and SQL procedure bodies with the same structure and the same control statements (such as conditional and loop statements) with minor exceptions.
The routine body is a SQL statement. In its simplest form, the body is a single SQL statement. A simple example of a function is given below:
CREATE FUNCTION an_hour_before (t TIMESTAMP) RETURNS TIMESTAMP RETURN t - 1 HOUR
An example of the use of the function in an SQL statement is given below:
SELECT an_hour_before(event_timestamp) AS notification_timestamp, event_name FROM events;
A simple example of a procedure is given below:
CREATE PROCEDURE new_customer(firstname VARCHAR(50), lastname VARCHAR(50)) MODIFIES SQL DATA INSERT INTO CUSTOMERS VALUES (DEFAULT, firstname, lastname, CURRENT_TIMESTAMP)
The procedure inserts a row into an existing table with the definition given below:
CREATE TABLE customers(id INTEGER GENERATED BY DEFAULT AS IDENTITY, firstname VARCHAR(50), lastname VARCHAR(50), added TIMESTAMP);
An example of the use of the procedure is given below:
CALL new_customer('JOHN', 'SMITH');
The routine body is often a compound statement. A compound statement can contain one or more SQL statements, which can include control statements, as well as nested compound statements.
The following SQL Statements can be used only in routines.
<handler declaration>
<assignment statement>
<compound statement>
<case statement>
<if statement>
<while statement>
<repeat statement>
<for statement>
<loop statement>
<iterate statement
<leave statement>
<signal statement>
<resignal statement>
<return statement>
<select statement: single
row>
The following SQL Statements can be used in procedures but not in functions.
<call statement>
<delete statement>
<insert statement>
<update statement>
<merge statement>
As shown in the examples below, the formal parameters and the variables of the routine can be used in statements, similar to the way a column reference is used.
A compound statement is enclosed in a BEGIN / END block with
optional labels. It can contain one or more <SQL variable
declaration>
or <handler
declaration>
before at least one SQL statement. The BNF is
given below:
<compound statement> ::= [ <beginning
label> <colon> ] BEGIN [[NOT] ATOMIC] [{<SQL variable
declaration> <semicolon>} ...] [{<handler declaration>
<semicolon>}...] {<SQL procedure statement>
<semicolon>} ... END [ <ending label> ]
An example of a simple compound statement body is given below. It performs the common task of inserting related data into two table. The IDENTITY value that is automatically inserted in the first table is retrieved using the IDENTITY() function and inserted into the second table.
CREATE PROCEDURE new_customer(firstname VARCHAR(50), lastname VARCHAR(50), address VARCHAR(100)) MODIFIES SQL DATA BEGIN ATOMIC INSERT INTO customers VALUES (DEFAULT, firstname, lastname, CURRENT_TIMESTAMP); INSERT INTO addresses VALUES (DEFAULT, IDENTITY(), address); END
A <variable declaration>
defines the name
and data type of the variable and, optionally, its default value. In the
next example, a variable is used to hold the IDENTITY value. In
addition, the formal parameters of the procedure are identified as input
parameters with the use of the optional IN keyword. This procedure does
exactly the same job as the procedure in the previous example.
CREATE PROCEDURE new_customer(IN firstname VARCHAR(50), IN lastname VARCHAR(50), IN address VARCHAR(100)) MODIFIES SQL DATA BEGIN ATOMIC DECLARE temp_id INTEGER; INSERT INTO CUSTOMERS VALUES (DEFAULT, firstname, lastname, CURRENT_TIMESTAMP); SET temp_id = IDENTITY(); INSERT INTO ADDRESSES VALUES (DEFAULT, temp_id, address); END
The BNF for variable declaration is given below:
<SQL variable declaration> ::= DECLARE
<variable name list> <data type> [DEFAULT <default
value>]
<variable name list> ::= <variable name> [
{ <comma> <variable name> }... ]
Examples of variable declaration are given below. Note that in a DECLARE statement with multiple comma-separated variable names, the type and the default value applies to all the variables in the list:
BEGIN ATOMIC DECLARE temp_zero DATE; DECLARE temp_one, temp_two INTEGER DEFAULT 2; DECLARE temp_three VARCHAR(20) DEFAULT 'no name'; -- more statements ... SET temp_zero = DATE '2010-03-18'; SET temp_two = 5; -- more statements ... END
A <handler declaration>
defines the
course of action when an exception or warning is raised during the
execution of the compound statement. A compound statement may have one
or more handler declarations. These handlers become active when code
execution enters the compound statement block and remain active in any
sub-block and statement within the block. The handlers become inactive
when code execution leaves the block.
In the previous example, if an exception is thrown during the execution of either SQL statement, the execution of the compound statement is terminated and the exception is propagated and thrown by the CALL statement for the procedure. A handler declaration can resolve the thrown exception within the compound statement without propagating it, and allow the execution of the <compound statement> to continue.
In the example below, the UNDO handler declaration catches any exception that is thrown during the execution of the compound statement inside the BEGIN / END block. As it is an UNDO handler, all the changes to data performed within the compound statement (BEGIN / END) block are rolled back. The procedure then returns without throwing an exception.
CREATE PROCEDURE NEW_CUSTOMER(IN firstname VARCHAR(50), IN lastname VARCHAR(50), IN address VARCHAR(100)) MODIFIES SQL DATA label_one: BEGIN ATOMIC DECLARE temp_id INTEGER; DECLARE UNDO HANDLER FOR SQLEXCEPTION LEAVE label_one; INSERT INTO CUSTOMERS VALUES (DEFAULT, firstname, lastname, CURRENT_TIMESTAMP); SET temp_id = IDENTITY(); INSERT INTO ADDRESSES VALUES (DEFAULT, temp_id, address); END
Other types of hander are CONTINUE and EXIT handlers. A CONTINUE handler ignores any exception and proceeds to the next statement in the block. An EXIT handler terminates execution without undoing the data changes performed by the previous (successful) statements.
The conditions can be general conditions, or specific conditions.
Among general conditions that can be specified, SQLEXCEPTION covers all
exceptions, SQLWARNING covers all warnings, while NOT FOUND covers the
not-found condition, which is raised when a DELETE, UPDATE, INSERT or
MERGE statement completes without actually affecting any row.
Alternatively, one or more specific conditions can be specified
(separated with commas) which apply to specific exceptions or warnings
or classes or exceptions or warnings. A specific condition is specified
with SQLSTATE <value>
, for example SQLSTATE
'W_01003' specifies the warning raised after a SQL statement is executed
which contains an aggregate function which encounters a null value
during execution. An example is given below which activates the handler
when either of the two warnings is raised:
DECLARE UNDO HANDLER FOR SQLSTATE 'W_01003', 'W_01004' LEAVE label_one;
The BNF for <handler declaration>
is
given below:
<handler declaration> ::= DECLARE {UNDO |
CONTINUE | EXIT} HANDLER FOR {SQLEXCEPTION | SQLWARNING | NOT FOUND} | {
SQL_STATE <state value> [, ...]} [<SQL procedure
statement>];
A handler declaration may specify an SQL procedure statement to be performed when the handler is activated. When an exception occurs, the example below performs the UNDO as in the previous example, then inserts the (invalid) data into a separate table.
DECLARE UNDO HANDLER FOR SQLEXCEPTION INSERT INTO invalid_customers VALUES(firstanme, lastname, address);
The <SQL procedure statement>
is required
by the SQL Standard but is optional in HyperSQL. If the execution of the
<SQL procedure statement>
specified in the
handler declaration throws an exception itself, then it is handled by
the handlers that are currently active. The <SQL procedure
statement>
can itself be a compound statement with its own
handlers.
The SET statement is used for assignment. It can be used flexibly with rows or single values. The BNF is given below:
<assignment statement> ::= <singleton
variable assignment> | <multiple variable
assignment>
<singleton variable assignment> ::= SET
<assignment target> <equals operator> <assignment
source>
<multiple variable assignment> ::= SET
(<variable or parameter>, ...) = <row value
expression>
In the example below, the result of the SELECT is assigned to two OUT or INOUT arguments. The SELECT must return one row. If it returns more than one, an exception is raised. If it returns no row, no change is made to ARG1 and ARG2.
SET (arg1, arg2) = (SELECT col1, col2 FROM atable WHERE id = 10);
In the example below, the result of a function call is assigned to VAR1.
SET var1 = SQRT(var2);
A special form of SELECT can also be used for assigning values from a query to one or more arguments or variables. This works similar to a SET statement that has a SELECT statement as the source.
SELECT : SINGLE ROW
select statement: single row
<select statement: single row> ::= SELECT [
<set quantifier> ] <select list> INTO <select target
list> <table expression>
<select target list> ::= <target
specification> [ { <comma> <target specification> }...
]
Retrieve values from a specified row of a table and assign the fields to the specified targets. The example below has an identical effect to the example of SET statement given above.
SELECT col1, col2 INTO arg1, arg2 FROM atable WHERE id = 10;
Each parameter of a procedure can be defined as IN, OUT or INOUT. An IN parameter is an input to the procedure and is passed by value. The value cannot be modified inside the procedure body. An OUT parameter is a reference for output. An INOUT parameter is a reference for both input and output. An OUT or INOUT parameter argument is passed by reference, therefore only a dynamic parameter argument or a variable within an enclosing procedure can be passed for it. The assignment statement is used to assign a value to an OUT or INOUT parameter.
In the example below, the procedure is declared with an OUT parameter.
CREATE PROCEDURE new_customer(OUT newid INT, IN firstname VARCHAR(50), IN lastname VARCHAR(50), IN address VARCHAR(100)) MODIFIES SQL DATA BEGIN ATOMIC DECLARE temp_id INTEGER; INSERT INTO CUSTOMERS VALUES (DEFAULT, firstname, lastname, CURRENT_TIMESTAMP); SET temp_id = IDENTITY(); INSERT INTO ADDRESSES VALUES (DEFAULT, temp_id, address); SET newid = temp_id; END
In the SQL session, or in the body of another stored procedure, a variable must be assigned to the OUT parameter. After the procedure call, this variable will hold the new identity value that was generated inside the procedure.
In the example below, a session variable,
the_new_id
is declared. After the call to
new_customer
, the value for the identity is stored in
the_new_id
variable. This is returned via the next
CALL statement. Alternatively, the_new_id
can be used
as an argument to another CALL statement.
DECLARE the_new_id INT DEFAULT NULL; CALL new_customer(the_new_id, 'John', 'Smith', '10 Parliament Square'); CALL the_new_id;
Various iterated statements can be used in routines. In these
statements, the <SQL statement list>
consists
of one or more SQL statements. The <search
condition>
can be any valid SQL expression of BOOLEAN
type.
<loop statement> ::= [ <beginning label>
<colon> ] LOOP <SQL statement list> END LOOP [ <ending
label> ]
<while statement> ::= [ <beginning label>
<colon> ] WHILE <search condition> DO <SQL statement
list> END WHILE [ <ending label> ]
<repeat statement> ::= [ <beginning label>
<colon> ]
REPEAT <SQL statement list> UNTIL <search
condition> END REPEAT [ <ending label>
In the example below, a multiple rows are inserted into a table in a WHILE loop:
loop_label: WHILE my_var > 0 DO INSERT INTO CUSTOMERS VALUES (DEFAULT, my_var); SET my_var = my_var - 1; IF my_var = 10 THEN SET my_var = 8; END IF; IF my_var = 22 THEN LEAVE loop_label; END IF; END WHILE loop_label;
There are two types of CASE ... WHEN statement and the IF ... THEN statement.
CASE WHEN
case when statement
The simple case statement uses a <case
operand>
as the predicand of one or more predicates. For
the right part of each predicate, it specifies one or more SQL
statements to execute if the predicate evaluates TRUE. If the ELSE
clause is not specified, at least one of the search conditions must be
true, otherwise an exception is raised.
<simple case statement> ::= CASE <case
operand> <simple case statement when clause>... [ <case
statement else clause> ] END CASE
<simple case statement when clause> ::= WHEN
<when operand list> THEN <SQL statement
list>
<case statement else clause> ::= ELSE <SQL
statement list>
A skeletal example is given below. The variable var_one is first tested for equality with 22 or 23 and if the test evaluates to TRUE, then the INSERT statement is performed and the statement ends. If the test does not evaluate to TRUE, the next condition test, which is an IN predicate, is performed with var_one and so on. The statement after the ELSE clause is performed if none the previous tests returns TRUE.
CASE var_one WHEN 22, 23 THEN INSERT INTO t_one ...; WHEN IN (2, 4, 5) THEN DELETE FROM t_one WHERE ...; ELSE UPDATE t_one ...; END CASE
The searched case statement uses one or more search conditions, and for each search condition, it specifies one or more SQL statements to execute if the search condition evaluates TRUE. An exception is raised if there is no ELSE clause and none of the search conditions evaluates TRUE.
<searched case statement> ::= CASE <searched
case statement when clause>... [ <case statement else clause> ]
END CASE
<searched case statement when clause> ::= WHEN
<search condition> THEN <SQL statement
list>
The example below is partly a rewrite of the previous example, but a new condition is added:
CASE WHEN var_one = 22 OR var_one = 23 THEN INSERT INTO t_one ...; WHEN var_one IN (2, 4, 5) THEN DELETE FROM t_one WHERE ...; WHEN var_two IS NULL THEN UPDATE t_one ...; ELSE UPDATE t_one ...; END CASE
IF
if statement
The if statement is very similar to the searched case statement. The difference is that no exception is raised if there is no ELSE clause and no search condition evaluates TRUE.
<if statement> ::= IF <search condition>
<if statement then clause> [ <if statement elseif clause>...
] [ <if statement else clause> ] END IF
<if statement then clause> ::= THEN <SQL
statement list>
<if statement elseif clause> ::= ELSEIF
<search condition> THEN <SQL statement
list>
<if statement else clause> ::= ELSE <SQL
statement list>
The RETURN statement is required and used only in functions. The body of a function is either a RETURN statement, or a compound statement that contains a RETURN statement.
RETURN
return statement
<return statement> ::= RETURN <return
value>
<return value> ::= <value expression> |
NULL
Return a value from an SQL function. If the function is defined as RETURNS TABLE, then the value is a TABLE expression such as RETURN TABLE(SELECT ...) otherwise, the value expression can be any scalar expression. In the examples below, the same function is written with or without a BEGIN END block. In both versions, the RETURN value is a scalar expression.
CREATE FUNCTION an_hour_before_max (e_type INT) RETURNS TIMESTAMP RETURN (SELECT MAX(event_time) FROM atable WHERE event_type = e_type) - 1 HOUR CREATE FUNCTION an_hour_before_max (e_type INT) RETURNS TIMESTAMP BEGIN ATOMIC DECLAR max_event TIMESTAMP; SET max_event = SELECT MAX(event_time) FROM atable WHERE event_type = e_type; RETURN max_event - 1 HOUR; END
In addition to the RETURN statement, the following statements can be used in specific contexts.
ITERATE STATEMENT
The ITERATE statement can be used to cause the next iteration of a labeled iterated statement (a WHILE, REPEAT or LOOP statement). It is similar to the "continue" statement in C and Java.
<iterate statement> ::= ITERATE <statement
label>
LEAVE STATEMENT
The LEAVE statement can be used to leave a labeled block. When used in an iterated statement, it is similar to the "break" statement is C and Java. But it can be used in compound statements as well.
<leave statement> ::= LEAVE <statement
label>
Signal and Resignal Statements
The SIGNAL statement is used to throw an exception (or force an exception). When invoked, any exception handler for the given exception is in turn invoked. If there is no handler, the exception is propagated to the enclosing context.
<signal statement> ::= SIGNAL SQL_STATE
<state value>
The RESIGNAL statement is used to throw an exception from an
exception handler’s <SQL procedure statement>
,
in effect propagating the exception to the enclosing context without
further action by the currently active handlers.
<resignal statement> ::= RESIGNAL SQL_STATE
<state value>
More than one version of a routine can be created.
For procedures, the different versions must have different parameter counts. When the procedure is called, the parameter count determines which version is called.
For functions, the different versions can have the same or different parameter counts. When the parameter count of two versions of a function is the same, the type of parameters must be different. The best matching version of the function is called, according to both the parameter count and parameter types.
Two versions of an overloaded function are given below. One version accepts TIMESTAMP while the other accepts TIME arguments.
CREATE FUNCTION an_hour_before_or_now(t TIMESTAMP) RETURNS TIMESTAMP IF t > CURRENT_TIMESTAMP THEN RETURN CURRENT_TIMESTAMP; ELSE RETURN t - 1 HOUR; END IF CREATE FUNCTION an_hour_before_or_now(t TIME) RETURNS TIME CASE t WHEN > CURRENT_TIME THEN RETURN CURRENT_TIME; WHEN >= TIME'01:00:00' THEN RETURN t - 1 HOUR; ELSE RETURN CURRENT_TIME; END CASE
more ..
The OUT or INOUT parameters of a procedure are used to assign simple values to dynamic parameters or to variables in the calling context. In addition, a SQL/PSM procedure may return result sets to the calling context. These result sets are dynamic in the sense that a procedure may return a different number of result sets or none at all in different invocations.
The SQL Standard uses a mechanism called CURSORS for accessing and modifying rows of a result set one by one. This mechanism is absolutely necessary when the database is accessed from an external application program. The JDBC ResultSet interface allows this method of access from Java programs and is supported by HyperSQL.
The SQL Standard uses cursors within the body of a procedure to return result sets. It specifies a somewhat complex mechanism to allow access to these cursors from the calling contexts. HyperSQL does not support access to such result sets within a calling SQL/PSM procedure. This is considered redundant as all operations on data can be performed with non-cursor SQL statements.
(feature to be implemented) HyperSQL will support returning single or multiple result sets from SQL/PSM procedures only via the JDBC CallableStatement interface. Cursors are declared and opened within the body of the procedure. No further operation is performed on the cursors within the procedure. When the execution of the procedure is complete, the cursors become available as Java ResultSet objects via the CallableStatement instance that called the SQL/PSM procedure.
Currently, a single result can be returned from FUNCTION routines, when the function is defined as RETURNS TABLE ( .. )
To return a table from a SELECT statement, you should use a return statement such as RETURN TABLE( SELECT ...); in a SQL/PSM function. A Java function should return a JDBCResultSet instance. For an example of how to construct a JDBCResultSet for this purpose, see the source code for the org.hsqldb.jdbc.JDBCArray class.
The JDBC CallableStatement class is used with the SQL statement
CALL <routine name> ( <argument 1>, ... )
to call both functions and procedures. The getXXX()
methods can be used to retrieve INOUT or OUT arguments after the call.
The getResultSet()
call can be used to access the
ResultSet returned from a function that returns a result set.
The body of a Java language routine is a static method of a Java class, specified with a fully qualified method name in the routine definition.
In the example below, the static method named
toZeroPaddedString
is specified to be called when
the function is invoked.
CREATE FUNCTION zero_pad(x BIGINT, digits INT, maxsize INT) RETURNS CHAR VARYING(100) NO SQL LANGUAGE JAVA PARAMETER STYLE JAVA EXTERNAL NAME 'CLASSPATH:org.hsqldb.lib.StringUtil.toZeroPaddedString'
The signature of the Java method (used in the Java code but not in SQL code to create the function) is given below:
public static String toZeroPaddedString(long value, int precision, int maxSize)
The parameter and return types and of the SQL routine definition must match those of the Java method according to the table below:
SMALLINT |
Short or Short |
INT |
int or Integer |
BIGINT |
long or Long |
NUMERIC or DECIMAL |
BigDecimal |
FLOAT or DOUBLE |
Double or Double |
CHAR or VARCHAR |
String |
DATE |
java.sql.Date |
TIME |
java.sql.Time |
TIMESTAMP |
java.sql.Timestamp |
BINARY |
Byte[] |
BOOLEAN |
boolean or Boolean |
ARRAY of any type | java.sql.Array |
TABLE |
java.sql.ResultSet |
If the specified Java method is not found or its parameters and
return types do not match the definition, an exception is raised. If more
than one version of the Java method exist, then the one with matching
parameter and return types is found and registered. If two “equivalent”
methods exist, the first one is registered. (This situation arises only
when a parameter is a primitive in one version and an Object in another
version, e.g. long
and
java.lang.Long
.).
When the Java method of an SQL/JRT routine returns a value, it
should be within the size and precision limits defined in the return type
of the SQL-invoked routine, otherwise an exception is raised. The scale
difference are ignored and corrected. For example, in the above example,
the RETURNS CHAR VARYING(100)
clause limits the length
of the strings returned from the Java method to 100. But if the number of
digits after the decimal point (scale) of a returned BigDecimal value is
larger than the scale specified in the RETURNS clause, the decimal
fraction is silently truncated and no exception of warning is
raised.
If two versions of the same SQL invoked routine with different parameter types are required, they can be defined to point to the same method name or different method names, or even methods in different classes. In the example below, the first two definitions refer to the same method name in the same class. In the Java class, the two static methods are defined with corresponding method signatures.
In the third example, the Java function returns a result set and the SQL declaration includes RETURNS TABLE.
CREATE FUNCTION an_hour_before_or_now(t TIME) RETURNS TIME NO SQL LANGUAGE JAVA PARAMETER STYLE JAVA EXTERNAL NAME 'CLASSPATH:org.npo.lib.nowLessAnHour' CREATE FUNCTION an_hour_before_or_now(t TIMESTAMP) RETURNS TIMESTAMP NO SQL LANGUAGE JAVA PARAMETER STYLE JAVA EXTERNAL NAME 'CLASSPATH:org.npo.lib.nowLessAnHour' CREATE FUNCTION testquery(INTEGER) RETURNS TABLE(n VARCHAR(20), i INT) READS SQL DATA LANGUAGE JAVA EXTERNAL NAME 'CLASSPATH:org.hsqldb.test.TestJavaFunctions.getQueryResult'
In the Java class:
public static java.sql.Time nowLessAnHour(java.sql.Time value) { ... } public static java.sql.Timestamp nowLessAnHour(java.sql.Timestamp value) ... } public static ResultSet getQueryResult(Connection connection, int i) throws SQLException { Statement st = connection.createStatement(); return st.executeQuery("SELECT * FROM T WHERE I < " + i); }
Java procedures are defined similarly to functions. The differences are:
The return type of the Java static method must be void.
If a parameter is defined as OUT or INOUT, the corresponding Java static method parameter must be defined as an array of the JDBC non-primitive type.
When the Java static method is invoked, the OUT and INOUT arguments are passed as a single-element array.
The static method can modify the OUT or INOUT param by assigning a value to the sole element of the argument array.
If the procedure contains SQL statements, only statements for data access and manipulation are allowed. The java method should not perform commit or rollback. The SQL statements should not change the session settings and should not include statements at create or modify tables definitions or other database objects. These rules are generally enforced by the engine, but additional enforcement may be added in future versions
An example of a procedure definition is given below:
CREATE PROCEDURE new_customer(firstname VARCHAR(50), lastname VARCHAR(50), address VARCHAR(100)) MODIFIES SQL DATA LANGUAGE JAVA EXTERNAL NAME 'CLASSPATH:org.hsqldb.test.Test01.newCustomerProcedure'
The legacy HyperSQL statement, CREATE ALIAS <name>
FOR <fully qualified Java method name>
is no longer
supported directly. It is supported when importing databases and
translates to a special CREATE FUNCTION <name>
statement that creates the function in the PUBLIC schema.
The direct use of a Java method as a function is still supported
but deprecated. It is internally translated to a special CREATE
FUNCTION
statement where the name of the function is the
double quoted, fully qualified name of the Java method used.
HyperSQL adds an extension to the SQL Standard to allow user defined aggregate functions. A user defined aggregate function has a single parameter when it is used in SQL statements. Unlike the predefined aggregate functions, the keyword DISTINCT cannot be used when a user defined aggregate function is invoked. Like all user defined functions, an aggregate function belongs to a schema and can be polymorphic.
A user defined aggregate function can be used in SQL statements where a predefined aggregate function is allowed.
An aggregate function is always defined with 4 parameters. The first parameter is the parameter that is used when the function is invoked in SQL statements, the rest of the parameter are invisible to the invoking statement. The type of the first parameter is user defined. The type of the second parameter must be BOOLEAN. The third and fourth parameters have user defined types and must be defined as INOUT parameters. The defined return type of the function determines the type of the value returned when the function is invoked.
CREATE AGGREGATE FUNCTION
user defined aggregate function definition
Aggregate function definition is similar to normal function
definition and has the mandatory <returns
clause>
. The BNF is given below.
<user defined aggregate function> ::= CREATE
AGGREGATE FUNCTION <schema qualified routine name> <SQL
aggregate parameter declaration list> <returns clause>
<routine characteristics> <routine body>
The parameter declaration list BNF is given below. The type of the first parameter is used when the function is invoked as part of an SQL statement. When multiple versions of a function are required, each version will have the first parameter of a different type.
<SQL aggregate declaration list> ::= <left
paren> [IN] [ <SQL parameter name> ] <parameter type>
<comma> [IN] [ <SQL parameter name> ] BOOLEAN <comma>
INOUT [ <SQL parameter name> ] <parameter type>
<comma> INOUT [ <SQL parameter name> ] <parameter
type> <right paren>
The return type is user defined. This is the type of the resulting value when the function is called. Usually an aggregate function is defined with CONTAINS SQL, as it normally does not read the data in database tables, but it is possible to define the function with READS SQL DATA and access the database tables.
HyperSQL invokes the aggregate function, with all the arguments set, once per each row in order to compute the values. Finally, it invokes the function once more to return the final result.
In the computation phase, the first argument is the value of the user argument as specified in the SQL statement, computed for the current row. The second argument is the boolean FALSE. The third and fourth argument values are initially null, but they can be updated in the body of the function during each invocation. The third and fourth arguments act as registers and hold their values between invocations. The return value of the function is ignored during the computation phase (when the second parameter is FALSE).
After the computation phase, the function is invoked once more to get the final result. In this invocation, the first argument is NULL and the second argument is boolean TRUE. The third and fourth arguments hold the values they held at the end of the last invocation. The value returned by the function in this invocation is used as the result of the aggregate function computation in the invoking SQL statement. In SQL queries with GROUP BY, the call sequence is repeated for each separate group.
The example below features a user defined version of the Standard
AVG(<value expression>)
aggregate function for
INTEGER input and output types. This function behaves differently from
the Standard AVG function as it returns 0 when all the input values are
null.
CREATE AGGREGATE FUNCTION udavg(IN x INTEGER, IN flag BOOLEAN, INOUT addup BIGINT, INOUT counter INT) RETURNS INTEGER CONTAINS SQL BEGIN ATOMIC IF flag THEN RETURN addup / counter; ELSE SET counter = COALESCE(counter, 0) + 1; SET addup = COALESCE(addup, 0) || COALESCE(x, 0); RETURN NULL; END IF; END
The user defined aggregate function is used in a select statement in the example below. Only the first parameter is visible and utilised in the select statement.
SELECT udavg(id) FROM customers GROUP BY lastname;
In the example below, the function returns an array that contains all the values passed for the aggregated column. For use with longer arrays, you can optimise the function by defining a larger array in the first iteration, and using the TRIM_ARRAY function on the RETURN to cut the array to size :
CREATE AGGREGATE FUNCTION array_aggregate(IN val VARCHAR(100), IN flag boolean, INOUT buffer VARCHAR(100) ARRAY, INOUT counter INT) RETURNS VARCHAR(100) ARRAY CONTAINS SQL BEGIN ATOMIC IF flag THEN RETURN buffer; ELSE IF val IS NULL THEN RETURN NULL; END IF; IF counter IS NULL THEN SET counter = 0; END IF; SET counter = counter + 1; IF counter = 1 THEN SET buffer = ARRAY[val]; ELSE SET buffer[counter] = val; END IF; RETURN NULL; END IF; END
The tables and data for the select statement below are created with the DatabaseManager or DatabaseManagerSwing GUI apps. Part of the output is shown. Each row of the output includes an array containing the values for the invoices for each customer.
SELECT ID, FIRSTNAME, LASTNAME, ARRAY_AGGREGATE(CAST(INVOICE.TOTAL AS VARCHAR(100))) FROM customer JOIN INVOICE ON ID =CUSTOMERID GROUP BY ID, FIRSTNAME, LASTNAME 11 Susanne Karsen ARRAY['3988.20'] 12 John Peterson ARRAY['2903.10','4382.10','4139.70','3316.50'] 13 Michael Clancy ARRAY['6525.30'] 14 James King ARRAY['3665.40','905.10','498.00'] 18 Sylvia Clancy ARRAY['634.20','4883.10'] 20 Bob Clancy ARRAY['3414.60','744.60']
A Java aggregate function is defined similarly to PSM functions,
apart from the routine body, which is defined as EXTERNAL NAME
...
The Java function signature must follow the rules for both
nullable and INOUT parameters, therefore:
No agrument is defined as a primitive or primitive array type. This allows nulls to be passed to the function. The second and third arguments must be defined as arrays of the JDBC non-primitive types listed in the table in the previous section.
In the example below, a user-defined aggregate function for geometric mean is defined.
CREATE AGGREGATE FUNCTION geometric_mean(IN val DOUBLE, IN flag BOOLEAN, INOUT register DOUBLE, INOUT counter INT) RETURNS DOUBLE NO SQL LANGUAGE JAVA EXTERNAL NAME 'CLASSPATH:org.hsqldb.test.Test01.geometricMean'
The Java function definition is given below:
public static Double geometricMean(Double in, Boolean flag, Double[] register, Integer[] counter) { if (flag) { if (register[0] == null) { return null; } double a = register[0].doubleValue(); double b = 1 / (double) counter[0]; return Double.valueOf(java.lang.Math.pow(a, b)); } if (in == null) { return null; } if (in.doubleValue() == 0) { return null; } if (register[0] == null) { register[0] = in; counter[0] = Integer.valueOf(1); } else { register[0] = Double.valueOf(register[0].doubleValue() * in.doubleValue()); counter[0] = Integer.valueOf(counter[0].intValue() + 1); } return null; }
In a select statement, the function is used like built in aggregate functions:
SELECT geometric_mean(age) FROM FROM customer
As discussed in the previous pages, routine definition has several mandatory or optional clauses. The complete BNF supported by HyperSQL and the remaining clauses are documented in this section.
CREATE FUNCTION
CREATE PROCEDURE
routine definition
Routine definition is similar for procedures and functions. A
function definition has the mandatory <returns
clause>
which is discussed later. The description given so
far covers the essential elements of the specification with the BNF given
below.
<schema procedure> ::= CREATE PROCEDURE <schema
qualified routine name> <SQL parameter declaration list>
<routine characteristics> <routine body>
<schema function> ::= CREATE FUNCTION <schema
qualified routine name> <SQL parameter declaration list>
<returns clause> <routine characteristics> <routine
body>
Parameter declaration list has been described above. For SQL/JRT
routines, the <SQL parameter name>
is optional
while for SQL/PSM routines, it is required. If the <parameter
mode>
of a parameter is OUT or INOUT, it must be specified.
The BNF is given below:
<SQL parameter declaration list> ::= <left
paren> [ <SQL parameter declaration> [ { <comma> <SQL
parameter declaration> }... ] ] <right paren>
<SQL parameter declaration> ::= [ <parameter
mode> ] [ <SQL parameter name> ] <parameter
type>
<parameter mode> ::= IN | OUT |
INOUT
<parameter type> ::= <data
type>
Return Value and Table Functions
RETURNS
returns clause
The <returns clause>
specifies the type of
the return value of a function. For all SQL/PSM functions and ordinary
SQL/JRT functions, this is simply a type definition which can be a
built-in type, a DOMAIN type or a DISTINCT type, or alternatively, a TABLE
definition. For example, RETURNS INTEGER.
For a SQL/JRT function, it is possible to define a
<returns table type>
for a Java method that
returns a java.sql.ResultSet
object. Such SQL/JRT
functions are called table functions. Table
functions are used differently from normal functions. A table function can
be used in an SQL query expression exactly where a normal table or view is
allowed. At the time of invocation, the Java method is called and the
returned ResultSet is transformed into an SQL table. The column types of
the declared TABLE must match those of the ResultSet, otherwise an
exception is raised at the time of invocation.
If a <returns table type>
is defined for an
SQL/PSM function, the following expression is used inside the function to
return a table: RETURN TABLE ( <query expression>
);
In the example blow, a table with two columns is
returned.
RETURN TABLE ( SELECT a, b FROM atable WHERE e = 10 );
If a JDBC CallableStatement
is used to CALL the
function, the table returned from the function call is returned and can be
accessed with the getResultSet()
method of the
CallableStatement
.
<returns clause> ::= RETURNS <returns
type>
<returns type> ::= <returns data type> |
<returns table type>
<returns table type> ::= TABLE <table function
column list>
<table function column list> ::= <left
paren> <table function column list element> [ { <comma>
<table function column list element> } ... ] <right
paren>
<table function column list element> ::=
<column name> <data type>
<returns data type> ::= <data
type>
routine body
routine body
Routine body is either one or more SQL statements or a Java
reference, as described. The user that defines the routine by issuing the
CREATE FUNCTION or CREATE SCHEMA command must have the relevant access
rights to all tables, sequences, routines, etc. that are accessed by the
routine. If another user is given EXECUTE privilege on the routine, then
there are two possibilities, depending on the <rights
clause>
. This clause refers to the access rights that are
checked when a routine is invoked. The default is SQL SECURITY
DEFINER
, which means access rights of the definer are used;
therefore no extra checks are performed when the other user invokes the
routine. The alternative SQL SECURITY INVOKER
means
access rights on all the database objects referenced by the routine are
checked for the invoker. This alternative is not supported by
HyperSQL.
<routine body> ::= <SQL routine spec> |
<external body reference>
<SQL routine spec> ::= [ <rights clause> ]
<SQL routine body>
<rights clause> ::= SQL SECURITY INVOKER | SQL
SECURITY DEFINER
SQL routine body
SQL routine body
The routine body of a an SQL routine consists of an statement.
<SQL routine body> ::= <SQL procedure
statement>
EXTERNAL NAME
external body reference
External name specifies the qualified name of the Java method
associated with this routine. Early releases of HyperSQL 2.0 only supports
Java methods within the classpath. The <external Java
reference string>
is a quoted string which starts with
CLASSPATH: and is followed by the Java package, class and method names
separated with dots. HyperSQL does not currently support the optional
<Java parameter declaration list>
.
<external body reference> ::= EXTERNAL NAME
<external Java reference string>
<external Java reference string> ::= <jar and
class name> <period> <Java method name> [ <Java
parameter declaration list> ]
The <routine characteristics>
clause
covers several sub-clauses
<routine characteristics> ::= [ <routine
characteristic>... ]
<routine characteristic> ::= <language
clause> | <parameter style clause> | SPECIFIC <specific
name> | <deterministic characteristic> | <SQL-data access
indication> | <null-call clause> | <returned result sets
characteristic> | <savepoint level
indication>
LANGUAGE
language clause
The <language clause>
refers to the
language in which the routine body is written. It is either SQL or Java.
The default is SQL, so JAVA must be specified for SQL/JRT
routines.
<language clause> ::= LANGUAGE <language
name>
<language name> ::= SQL |
JAVA
The parameter style is not allowed for SQL routines. It is optional for Java routines and, in HyperSQL, the only value allowed is JAVA.
<parameter style> ::= JAVA
SPECIFIC NAME
specific name
The SPECIFIC <specific name>
clause is
optional but the engine will creates an automatic name if it is not
present. When there are several versions of the same routine, the
<specific name>
is used in schema manipulation
statements to drop or alter a specific version. The
<specific name>
is a user-defined name. It
applies to both functions and procedures. In the examples below, a
specific name is specified for each function.
CREATE FUNCTION an_hour_before_or_now(t TIMESTAMP) RETURNS TIMESTAMP NO SQL LANGUAGE JAVA PARAMETER STYLE JAVA SPECIFIC an_hour_before_or_now_with_timestamp EXTERNAL NAME 'CLASSPATH:org.npo.lib.nowLessAnHour' CREATE FUNCTION an_hour_before_max (e_type INT) RETURNS TIMESTAMP SPECIFIC an_hour_before_max_with_int RETURN (SELECT MAX(event_time) FROM atable WHERE event_type = e_type) - 1 HOUR
DETERMINISTIC
deterministic characteristic
The <deterministic characteristic>
clause
indicates that a routine is deterministic or not. Deterministic means
the routine does not reference random values, external variables, or
time of invocation. The default is NOT DETERMINISTIC
.
It is essential to declare this characteristics correctly for an SQL/JRT
routine, as the engine does not know the contents of the Java code,
which could include calls to methods returning random or time sensitive
values.
<deterministic characteristic> ::= DETERMINISTIC
| NOT DETERMINISTIC
SQL DATA access
SQL DATA access characteristic
The <SQL-data access indication>
clause
indicates the extent to which a routine interacts with the database or
the data stored in the database tables (SQL data). NO SQL means no SQL
command is issued in the routine body and can be used only for SQL/JRT
functions. CONTAINS SQL
means some SQL commands are
used, but they do not read or modify the SQL data. READS SQL
DATA
and MODIFIES SQL DATA
are self
explanatory.
<SQL-data access indication> ::= NO SQL |
CONTAINS SQL | READS SQL DATA | MODIFIES SQL DATA
NULL INPUT
null call clause
Null Arguments
The <null-call clause>
is used only for
functions. If a function returns NULL when any of the calling arguments
is null, then by specifying RETURNS NULL ON NULL
INPUT
, calls to the function are known to be redundant and do
not take place when an argument is null. This simplifies the coding of
the SQL/JRT Java methods and improves performance at the same
time.
<null-call clause> ::= RETURNS NULL ON NULL
INPUT | CALLED ON NULL INPUT
SAVEPOINT LEVEL
transaction impact
The <savepoint level indication>
is used
only for procedures and refers to the visibility of existing savepoints
within the body of the procedure. If NEW SAVEPOINT
LEVEL
is specified, savepoints that have been declared prior
to calling the procedure become invisible within the body of the
procedure. HyperSQL’s implementation accepts only NEW SAVEPOINT
LEVEL
, which must be specified.
<savepoint level indication> ::= NEW SAVEPOINT
LEVEL | OLD SAVEPOINT LEVEL
DYNAMIC RESULT SETS
returned result sets characteristic
The <returned result sets characteristic>
is used only for SQL/PSM procedures. The maximum number of result sets
that a procedure may return can be specified with the clause below. The
default is zero. Details are discussed in the previous sections.
<returned result sets characteristic> ::=
DYNAMIC RESULT SETS <maximum returned result
sets>
$Revision: 3601 $