gpdbrestore

Restores a database from a set of dump files generated by gpcrondump.

Synopsis

gpdbrestore { -t timestamp_key { [-L] |
   [--netbackup-service-host netbackup_server 
   [--netbackup-block-size size] ] }
   -b YYYYMMDD | -R hostname:path_to_dumpset | -s database_name } 
   [--noplan] [--noanalyze] [-u backup_directory] [--list-backup]
   [--prefix prefix_string] [--report-status-dir report_directory]
   [-T schema.table [,...]] [--table-file file_name] [--truncate] [-e] [-G] 
   [-B  parallel_processes] [-d master_data_directory] [-a] [-q] 
   [-l logfile_directory] [-v] [--ddboost] 
   [--redirect database_name ]

gpdbrestore -? 

gpdbrestore --version

Description

The gpdbrestore utility recreates the data definitions (schema) and user data in a Greenplum database using the script files created by gpcrondump operations.

When you restore from an incremental backup, the gpdbrestore utility assumes the complete backup set is available. The complete backup set includes the following backup files:

  • The last full backup before the specified incremental backup
  • All incremental backups created between the time of the full backup the specified incremental backup

The gpdbrestore utility provides the following functionality:

  • Automatically reconfigures for compression.
  • Validates the number of dump files are correct (for primary only, mirror only, primary and mirror, or a subset consisting of some mirror and primary segment dump files).
  • If a failed segment is detected, restores to active segment instances.
  • Except when restoring data from a NetBackup server, you do not need to know the complete timestamp key (-t) of the backup set to restore. Additional options are provided to instead give just a date (-b), backup set directory location (-R), or database name (-s) to restore.
  • The -R option allows the ability to restore from a backup set located on a host outside of the Greenplum Database array (archive host). Ensures that the correct dump file goes to the correct segment instance.
  • Identifies the database name automatically from the backup set.
  • Allows you to restore particular tables only (-T option) instead of the entire database. Note that single tables are not automatically dropped or truncated prior to restore.

    Performs an ANALYZE operation on the tables that are restored. You can disable the ANALYZE operation by specifying the option --noanalyze.

  • Can restore global objects such as roles and tablespaces (-G option).
  • Detects if the backup set is primary segments only or primary and mirror segments and performs the appropriate restore operation.
  • Allows you to drop the target database before a restore in a single operation.
Restoring a Database from NetBackup

Greenplum Database must be configured to communicate with the Symantec NetBackup master server that is used to restore database data. See the Greenplum Database System Administrator Guide for information about configuring Greenplum Database and NetBackup.

When restoring from NetBackup server, you must specify the timestamp of the backup with the -t option.

NetBackup is not compatible with DDBoost. Both NetBackup and DDBoost cannot be used in a single back up operation.

Restoring a Database with Named Pipes

If you used named pipes when you backed up a database with gpcrondump, named pipes with the backup data must be available when restoring the database from the backup.

Error Reporting

gpdbrestore does not report errors automatically. After the restore is completed, check the report status files to verify that there are no errors. The restore status files are stored in the db_dumps/date/ directory by default.

Options

-a (do not prompt)
Do not prompt the user for confirmation.
-b YYYYMMDD
Looks for dump files in the segment data directories on the Greenplum Database array of hosts in db_dumps/YYYYMMDD. If --ddboost is specified, the systems looks for dump files on the Data Domain Boost host.
-B parallel_processes
The number of segments to check in parallel for pre/post-restore validation. If not specified, the utility will start up to 60 parallel processes depending on how many segment instances it needs to restore.
-d master_data_directory
Optional. The master host data directory. If not specified, the value set for $MASTER_DATA_DIRECTORY will be used.
--ddboost
Use Data Domain Boost for this restore, if the --ddboost option was passed when the data was dumped. Before using Data Domain Boost, make sure the one-time Data Domain Boost credential setup is complete. See "Backing Up and Restoring Databases" in the Greenplum Database Administrator Guide for details.
If you backed up Greenplum Database configuration files with the gpcrondump option -g and specified the --ddboost option, you must manually restore the backup from the Data Domain system. The configuration files must be restored for the Greenplum Database master and all the hosts and segments. The backup location on the Data Domain system is the directory GPDB/backup_directory/date. The backup_directory is set when you specify the Data Domain credentials with gpcrondump.
This option is not supported if --netbackup-service-host is specified.
-e (drop target database before restore)
Drops the target database before doing the restore and then recreates it.
-G (restore global objects)
Restores global objects such as roles and tablespaces if the global object dump file db_dumps/date/gp_global_1_1_timestamp is found in the master data directory.
-l logfile_directory
The directory to write the log file. Defaults to ~/gpAdminLogs.
--list-backup
Lists the set of full and incremental backup sets required to perform a restore based on the timestamp_key specified with the -t option and the location of the backup set.
This option is supported only if the timestamp_key is for an incremental backup.
-L (list tablenames in backup set)
When used with the -t option, lists the table names that exist in the named backup set and exits. Does not perform a restore.
--netbackup-block-size size
Specify the block size, in bytes, of data being transferred from the Symantec NetBackup server. The default is 512 bytes.
NetBackup options are not supported if DDBoost backup options are specified.
--netbackup-service-host netbackup_server
The NetBackup master server that Greenplum Database connects to when backing up to NetBackup. If you specify this option, you must specify the timestamp of the backup with the -t option.
This option is not supported with any of the these options: -R, -s, -b, -L, or --ddboost.
NetBackup options are not supported if DDBoost backup options are specified.
--noanalyze
The ANALYZE command is not run after a successful restore. The default is to run the ANALYZE command on restored tables. This option is useful if running ANALYZE on tables in your database requires a significant amount of time.
If this option is specified, you should run ANALYZE manually on restored tables. Failure to run ANALYZE following a restore might result in poor database performance.
--noplan
Restores only the data backed up during the incremental backup specified by the timestamp_key. No other data from the complete backup set are restored. The full backup set containing the incremental backup must be available.
If the timestamp_key specified with the -t option does not reference an incremental backup, an error is returned.
--prefix prefix_string
If you specified the gpcrondump option --prefix prefix_string to create the backup, you must specify this option with the prefix_string when restoring the backup.
If you created a full backup of a set of tables with gpcrondump and specified a prefix, you can use gpcrondump with the options --list-filter-tables and --prefix prefix_string to list the tables that were included or excluded for the backup.
-q (no screen output)
Run in quiet mode. Command output is not displayed on the screen, but is still written to the log file.
-R hostname:path_to_dumpset
Allows you to provide a hostname and full path to a set of dump files. The host does not have to be in the Greenplum Database array of hosts, but must be accessible from the Greenplum master.
--redirect database_name
Specify the name of the database where the data is restored. Specify this option to restore data to a database that is different than the database specified during back up. If database_name does not exist, it is created.
--report-status-dir report_directory
Specifies the absolute path to the directory on the each Greenplum Database host (master and segment hosts) where gpdbrestore writes report status files for a restore operation. If report_directory does not exist or is not writable, gpdbrestore returns an error and stops.
If this option is not specified and the -u option is specified, report status files are written to the location specified by the -u option if the -u location is writable. If the location specified by -u option is not writable, the report status files are written to segment data directories.
-s database_name
Looks for latest set of dump files for the given database name in the segment data directories db_dumps directory on the Greenplum Database array of hosts.
-t timestamp_key
The 14 digit timestamp key that uniquely identifies a backup set of data to restore. It is of the form YYYYMMDDHHMMSS. Looks for dump files matching this timestamp key in the segment data directories db_dumps directory on the Greenplum Database array of hosts.
-T schema.table_name
A comma-separated list of specific table names to restore. The named table(s) must exist in the backup set of the database being restored. Existing tables are not automatically truncated before data is restored from backup. If your intention is to replace existing data in the table from backup, truncate the table prior to running gpdbrestore -T.
To restore the tables for a specific schema, you can specify the schema name with the * wildcard character. For example -T mytest.* restores the tables for the schema mytest. System catalog schemas are not supported.
If you specify the --truncate option with the -t schema.* option, all existing tables within the database schema are truncated before tables are restored from the backup.
--table-file file_name
Specify a file file_name that contains a list of table names to restore. The file contains any number of table names, listed one per line. See the -T option for information about restoring specific tables.
--truncate
Truncate table data before restoring data to the table from the backup. If this option is not specified, existing table data is not removed before data is restored to the table.
This option is supported only when restoring a set of tables with the option -T or --table-file.
This option is not supported with the -e option.
-u backup_directory
Specifies the absolute path to the directory containing the db_dumps directory on each host. If not specified, defaults to the data directory of each instance to be backed up. Specify this option if you specified a backup directory with the gpcrondump option -u when creating a backup set.
If backup_directory is not writable, backup operation report status files are written to segment data directories. You can specify a different location where report status files are written with the --report-status-dir option.
Note: This option is not supported if --ddboost is specified.
-v | --verbose
Specifies verbose mode.
--version (show utility version)
Displays the version of this utility.
-? (help)
Displays the online help.

Examples

Restore the sales database from the latest backup files generated by gpcrondump (assumes backup files are in the segment data directories in db_dumps):

gpdbrestore -s sales

Restore a database from backup files that reside on an archive host outside the Greenplum Database array (command issued on the Greenplum master host):

gpdbrestore -R archivehostname:/data_p1/db_dumps/20080214

Restore global objects only (roles and tablespaces):

gpdbrestore -G
Note: The -R option is not supported when restoring a backup set that includes incremental backups.

If you restore from a backup set that contains an incremental backup, all the files in the backup set must be available to gpdbrestore. For example, the following timestamp keys specify a backup set. 20120514054532 is the full backup and the others are incremental.

20120514054532 
20120714095512 
20120914081205 
20121114064330 
20130114051246

The following gbdbrestore command specifies the timestamp key 20121114064330. The incremental backup with the timestamps 20120714095512 and 20120914081205 and the full backup must be available to perform a restore.

gpdbrestore -t 20121114064330

The following gbdbrestore command uses the --noplan option to restore only the data that was backed up during the incremental backup with the timestamp key 20121114064330. Data in the previous incremental backups and the data in the full backup are not restored.

gpdbrestore -t 20121114064330 --noplan

This gpdbrestore command restores Greenplum Database data from the data managed by NetBackup master server nbu_server1. The option -t 20130530090000 specifies the timestamp generated by gpcrondump when the backup was created. The -e option specifies that the target database is dropped before it is restored.

gpdbrestore -t 20130530090000 -e --netbackup-service-host=nbu_server1

See Also

gpcrondump