gpreload

Reloads Greenplum Database table data sorting the data based on specified columns.

Synopsis

gpreload -d database [-p port] {-t | --table-file} path_to_file [-a]

gpreload -h 

gpreload --version

Description

The gpreload utility reloads table data with column data sorted. For tables that were created with the table storage option APPENDONLY=TRUE and compression enabled, reloading the data with sorted data can improve table compression. You specify a list of tables to be reloaded the table column to be sorted in a text file.

Compression is improved by sorting data when the data in the column has a relatively low number of distinct values when compared to the total number of rows.

For a table being reloaded, the order of the columns to be sorted might affect compression. The columns with fewest distinct values should be listed first. For example, listing state then city would generally result in better compression than listing city then state.

public.cust_table: state, city
public.cust_table: city, state

For information about the format of the file used with gpreload, see the --table-file option.

Notes

To improve reload performance, indexes on tables being reloaded should be removed before reloading the data.

Running the ANALYZE command after reloading table data might query performance because of a change in the data distribution of the reloaded data.

Options

-a (do not prompt)
Optional. If specified, the gpreload utility does not prompt the user for confirmation.
-d database
The database that contains the tables to be reloaded. The gpreload utility connects to the database as the user running the utility.
-p port
The Greenplum Database master port. If not specified, the value of the PGPORT environment variable is used. If the value is not available, an error is returned.
{-t | --table-file } path_to_file
The location and name of file containing list of schema qualified table names to reload and the column names to reorder from the Greenplum Database. Only user defined tables are supported. Views or system catalog tables are not supported.
If indexes are defined on table listed in the file, gpreload prompts to continue.
Each line specifies a table name and the list of columns to sort. This is the format of each line in the file:
schema.table_name: column [desc] [, column2 [desc] ... ]
The table name is followed by a colon ( : ) and then at least one column name. If you specify more than one column, separate the column names with a comma. The columns are sorted in ascending order. Specify the keyword desc after the column name to sort the column in descending order.
Wildcard characters are not supported.
If there are errors in the file, gpreload reports the first error and exits. No data is reloaded.
The following example reloads three tables:
public.clients: region, state, rep_id desc
public.merchants: region, state
test.lineitem: group, assy, whse 
In the first table public.clients, the data in the rep_id column is sorted in descending order. The data in the other columns are sorted in ascending order.
--version (show utility version)
Displays the version of this utility.
-? (help)
Displays the online help.

Example

This example command reloads the tables in the database mytest that are listed in the file data-tables.txt.

gpreload -d mytest --table-file data-tables.txt

See Also

CREATE TABLE in the Greenplum Database Reference Guide