Summary of Built-in Functions |

Greenplum Database Function Types

Greenplum Database evaluates functions and operators used in SQL expressions. Some functions and operators are only allowed to execute on the master since they could lead to inconsistencies in Greenplum Database segment instances. This table describes the Greenplum Database Function Types.

Table 1. Functions in Greenplum Database
Function Type	Greenplum Support	Description	Comments
IMMUTABLE	Yes	Relies only on information directly in its argument list. Given the same argument values, always returns the same result.
STABLE	Yes, in most cases	Within a single table scan, returns the same result for same argument values, but results change across SQL statements.	Results depend on database lookups or parameter values. `current_timestamp` family of functions is `STABLE`; values do not change within an execution.
VOLATILE	Restricted	Function values can change within a single table scan. For example: `random()`, `currval()`, `timeofday()`.	Any function with side effects is volatile, even if its result is predictable. For example: `setval()`.

In Greenplum Database, data is divided up across segments — each segment is a distinct PostgreSQL database. To prevent inconsistent or unexpected results, do not execute functions classified as VOLATILE at the segment level if they contain SQL commands or modify the database in any way. For example, functions such as setval() are not allowed to execute on distributed data in Greenplum Database because they can cause inconsistent data between segment instances.

To ensure data consistency, you can safely use VOLATILE and STABLE functions in statements that are evaluated on and run from the master. For example, the following statements run on the master (statements without a FROM clause):

SELECT setval('myseq', 201);
SELECT foo();

If a statement has a FROM clause containing a distributed table and the function in the FROM clause returns a set of rows, the statement can run on the segments:

SELECT * from foo();

Greenplum Database does not support functions that return a table reference (rangeFuncs) or functions that use the refCursor datatype.

Built-in Functions and Operators

The following table lists the categories of built-in functions and operators supported by PostgreSQL. All functions and operators are supported in Greenplum Database as in PostgreSQL with the exception of STABLE and VOLATILE functions, which are subject to the restrictions noted in Greenplum Database Function Types. See the Functions and Operators section of the PostgreSQL documentation for more information about these built-in functions and operators.

Table 2. Built-in functions and operators
Operator/Function Category	VOLATILE Functions	STABLE Functions	Restrictions
Logical Operators
Comparison Operators
Mathematical Functions and Operators	random setseed
String Functions and Operators	All built-in conversion functions	convert pg_client_encoding
Binary String Functions and Operators
Bit String Functions and Operators
Pattern Matching
Data Type Formatting Functions		to_char to_timestamp
Date/Time Functions and Operators	timeofday	age current_date current_time current_timestamp localtime localtimestamp now
Geometric Functions and Operators
Network Address Functions and Operators
Sequence Manipulation Functions	currval lastval nextval setval
Conditional Expressions
Array Functions and Operators		All array functions
Aggregate Functions
Subquery Expressions
Row and Array Comparisons
Set Returning Functions	generate_series
System Information Functions		All session information functions All access privilege inquiry functions All schema visibility inquiry functions All system catalog information functions All comment information functions
System Administration Functions	set_config pg_cancel_backend pg_reload_conf pg_rotate_logfile pg_start_backup pg_stop_backup pg_size_pretty pg_ls_dir pg_read_file pg_stat_file	current_setting All database object size functions	Note: The function `pg_column_size` displays bytes required to store the value, perhaps with TOAST compression.
XML Functions		xmlagg(xml) xmlexists(text, xml) xml_is_well_formed(text) xml_is_well_formed_document(text) xml_is_well_formed_content(text) xpath(text, xml) xpath(text, xml, text[]) xpath_exists(text, xml) xpath_exists(text, xml, text[]) xml(text) text(xml) xmlcomment(xml) xmlconcat2(xml, xml)

Window Functions

The following built-in window functions are Greenplum extensions to the PostgreSQL database. All window functions are immutable. For more information about window functions, see "Window Expressions" in the Greenplum Database Administrator Guide.

Table 3. Window functions
Function	Return Type	Full Syntax	Description
`cume_dist()`	`double precision`	`CUME_DIST() OVER ( [PARTITION BY` expr `] ORDER BY` expr `)`	Calculates the cumulative distribution of a value in a group of values. Rows with equal values always evaluate to the same cumulative distribution value.
`dense_rank()`	`bigint`	`DENSE_RANK () OVER ( [PARTITION BY` expr `] ORDER BY` expr `)`	Computes the rank of a row in an ordered group of rows without skipping rank values. Rows with equal values are given the same rank value.
`first_value(expr)`	same as input expr type	`FIRST_VALUE(` expr `) OVER ( [PARTITION BY` expr `] ORDER BY` expr `[ROWS\|RANGE` frame_expr `] )`	Returns the first value in an ordered set of values.
`lag(expr [,offset] [,default])`	same as input expr type	`LAG(` expr `[,` offset `] [,` default `]) OVER ( [PARTITION BY` expr `] ORDER BY` expr `)`	Provides access to more than one row of the same table without doing a self join. Given a series of rows returned from a query and a position of the cursor, `LAG` provides access to a row at a given physical offset prior to that position. The default `offset` is 1. default sets the value that is returned if the offset goes beyond the scope of the window. If default is not specified, the default value is null.
`last_valueexpr`	same as input expr type	`LAST_VALUE(expr) OVER ( [PARTITION BY expr] ORDER BY expr [ROWS\|RANGE frame_expr] )`	Returns the last value in an ordered set of values.
`lead(expr [,offset] [,default])`	same as input expr type	`LEAD(expr [,offset] [,exprdefault]) OVER ( [PARTITION BY expr] ORDER BY expr )`	Provides access to more than one row of the same table without doing a self join. Given a series of rows returned from a query and a position of the cursor, `lead` provides access to a row at a given physical offset after that position. If offset is not specified, the default offset is 1. default sets the value that is returned if the offset goes beyond the scope of the window. If default is not specified, the default value is null.
`ntile(expr)`	bigint	`NTILE(expr) OVER ( [PARTITION BY expr] ORDER BY expr )`	Divides an ordered data set into a number of buckets (as defined by expr) and assigns a bucket number to each row.
`percent_rank(`)	`double precision`	`PERCENT_RANK () OVER ( [PARTITION BY expr] ORDER BY expr )`	Calculates the rank of a hypothetical row `R` minus 1, divided by 1 less than the number of rows being evaluated (within a window partition).
`rank()`	bigint	`RANK () OVER ( [PARTITION BY expr] ORDER BY expr )`	Calculates the rank of a row in an ordered group of values. Rows with equal values for the ranking criteria receive the same rank. The number of tied rows are added to the rank number to calculate the next rank value. Ranks may not be consecutive numbers in this case.
`row_number(`)	`bigint`	`ROW_NUMBER () OVER ( [PARTITION BY expr] ORDER BY expr )`	Assigns a unique number to each row to which it is applied (either each row in a window partition or each row of the query).

Advanced Analytic Functions

The following built-in advanced analytic functions are Greenplum extensions of the PostgreSQL database. Analytic functions are immutable.

Table 4. Advanced Analytic Functions
Function	Return Type	Full Syntax	Description
`matrix_add(array[], array[])`	`smallint[], int[], bigint[], float[]`	`matrix_add( array[[1,1],[2,2]], array[[3,4],[5,6]]`)	Adds two two-dimensional matrices. The matrices must be conformable.
`matrix_multiply( array[], array[])`	`smallint[]int[], bigint[], float[]`	`matrix_multiply( array[[2,0,0],[0,2,0],[0,0,2]], array[[3,0,3],[0,3,0],[0,0,3]] )`	Multiplies two, three- dimensional arrays. The matrices must be conformable.
`matrix_multiply( array[], expr)`	`int[], float[]`	`matrix_multiply( array[[1,1,1], [2,2,2], [3,3,3]], 2)`	Multiplies a two-dimensional array and a scalar numeric value.
`matrix_transpose( array[])`	Same as input `array` type.	`matrix_transpose( array [[1,1,1],[2,2,2]])`	Transposes a two-dimensional array.
`pinv(array [])`	`smallint[]int[], bigint[], float[]`	`pinv(array[[2.5,0,0],[0,1,0],[0,0,.5]])`	Calculates the Moore-Penrose pseudoinverse of a matrix.
`unnest (array[])`	set of `anyelement`	`unnest( array['one', 'row', 'per', 'item'])`	Transforms a one dimensional array into rows. Returns a set of `anyelement`, a polymorphic pseudotype in PostgreSQL.

Table 5. Advanced Aggregate Functions
Function	Return Type	Full Syntax	Description
`MEDIAN (expr)`	`timestamp, timestampz, interval, float`	`MEDIAN (expression)` Example: SELECT department_id, MEDIAN(salary) FROM employees GROUP BY department_id;	Can take a two-dimensional array as input. Treats such arrays as matrices.
`PERCENTILE_CONT (expr) WITHIN GROUP (ORDER BY expr [DESC/ASC])`	`timestamp, timestampz, interval, float`	`PERCENTILE_CONT(percentage) WITHIN GROUP (ORDER BY expression)` Example: SELECT department_id, PERCENTILE_CONT (0.5) WITHIN GROUP (ORDER BY salary DESC) "Median_cont"; FROM employees GROUP BY department_id;	Performs an inverse distirbution function that assumes a continuous distribution model. It takes a percentile value and a sort specification and returns the same datatype as the numeric datatype of the argument. This returned value is a computed result after performing linear interpolation. Null are ignored in this calculation.
`PERCENTILE_DISC (expr) WITHIN GROUP (ORDER BY expr [DESC/ASC]`)	`timestamp, timestampz, interval, float`	`PERCENTILE_DISC(percentage) WITHIN GROUP (ORDER BY expression)` Example: SELECT department_id, PERCENTILE_DISC (0.5) WITHIN GROUP (ORDER BY salary DESC) "Median_desc"; FROM employees GROUP BY department_id;	Performs an inverse distirbution function that assumes a discrete distribution model. It takes a percentile value and a sort specification. This returned value is an element from the set. Null are ignored in this calculation.
`sum(array[])`	`smallint[]int[], bigint[], float[]`	`sum(array[[1,2],[3,4]])` Example: CREATE TABLE mymatrix (myvalue int[]); INSERT INTO mymatrix VALUES (array[[1,2],[3,4]]); INSERT INTO mymatrix VALUES (array[[0,1],[1,0]]); SELECT sum(myvalue) FROM mymatrix; sum --------------- {{1,3},{4,4}}	Performs matrix summation. Can take as input a two-dimensional array that is treated as a matrix.
`pivot_sum (label[], label, expr)`	`int[], bigint[], float[]`	`pivot_sum( array['A1','A2'], attr, value)`	A pivot aggregation using sum to resolve duplicate entries.
`mregr_coef(expr, array[])`	`float[]`	`mregr_coef(y, array[1, x1, x2])`	The four `mregr_*` aggregates perform linear regressions using the ordinary-least-squares method. `mregr_coef` calculates the regression coefficients. The size of the return array for `mregr_coef` is the same as the size of the input array of independent variables, since the return array contains the coefficient for each independent variable.
`mregr_r2 (expr, array[])`	`float`	`mregr_r2(y, array[1, x1, x2])`	The four `mregr_*` aggregates perform linear regressions using the ordinary-least-squares method. `mregr_r2` calculates the r-squared error value for the regression.
`mregr_pvalues(expr, array[])`	`float[]`	`mregr_pvalues(y, array[1, x1, x2])`	The four `mregr_*` aggregates perform linear regressions using the ordinary-least-squares method. `mregr_pvalues` calculates the p-values for the regression.
`mregr_tstats(expr, array[])`	`float[]`	`mregr_tstats(y, array[1, x1, x2])`	The four `mregr_*` aggregates perform linear regressions using the ordinary-least-squares method. `mregr_tstats` calculates the t-statistics for the regression.
`nb_classify(text[], bigint, bigint[], bigint[])`	`text`	`nb_classify(classes, attr_count, class_count, class_total)`	Classify rows using a Naive Bayes Classifier. This aggregate uses a baseline of training data to predict the classification of new rows and returns the class with the largest likelihood of appearing in the new rows.
`nb_probabilities(text[], bigint, bigint[], bigint[])`	text	`nb_probabilities(classes, attr_count, class_count, class_total)`	Determine probability for each class using a Naive Bayes Classifier. This aggregate uses a baseline of training data to predict the classification of new rows and returns the probabilities that each class will appear in new rows.