Preparing and Adding Nodes
Verify your new nodes are ready for integration into the existing Greenplum system.
To prepare new system nodes for expansion, install the Greenplum Database software binaries, exchange the required SSH keys, and run performance tests.
Pivotal recommends running performance tests first on the new nodes and then all nodes. Run the tests on all nodes with the system offline so user activity does not distort results.
Generally, Pivotal recommends running performance tests when an administrator modifies node networking or other special conditions in the system. For example, if you will run the expanded system on two network clusters, run tests on each cluster.
Adding New Nodes to the Trusted Host Environment
New nodes must exchange SSH keys with the existing nodes to enable Greenplum administrative utilities to connect to all segments without a password prompt. Pivotal recommends performing the key exchange process twice.
First perform the process as root, for administration convenience, and then as the user gpadmin, for management utilities. Perform the following tasks in order:
To exchange SSH keys as root
- Create a host file with the existing host names in your array and a
separate host file with the new expansion host names. For existing hosts, you can use
the same host file used to set up SSH keys in the system. In the files, list all hosts
(master, backup master, and segment hosts) with one name per line and no extra lines or
spaces. Exchange SSH keys using the configured host names for a given host if you use a
multi-NIC configuration. In this example, mdw is configured with a
single NIC, and sdw1, sdw2, and sdw3
are configured with 4
NICs:
mdw sdw1-1 sdw1-2 sdw1-3 sdw1-4 sdw2-1 sdw2-2 sdw2-3 sdw2-4 sdw3-1 sdw3-2 sdw3-3 sdw3-4
- Log in as root on the master host, and source the
greenplum_path.sh file from your Greenplum
installation.
$ su - # source /usr/local/greenplum-db/greenplum_path.sh
- Run the gpssh-exkeys utility referencing the host list
files. For
example:
# gpssh-exkeys -f /home/gpadmin/existing_hosts_file -x /home/gpadmin/new_hosts_file
-
gpssh-exkeys checks the remote hosts and performs the
key exchange between all hosts. Enter the root user password when
prompted. For
example:
***Enter password for root@hostname: <root_password>
To create the gpadmin user
- Use gpssh to create the gpadmin user
on all the new segment hosts (if it does not exist already). Use the list of new hosts
you created for the key exchange. For
example:
# gpssh -f new_hosts_file '/usr/sbin/useradd gpadmin -d /home/gpadmin -s /bin/bash'
- Set a password for the new gpadmin user. On Linux, you
can do this on all segment hosts simultaneously using gpssh. For
example:
# gpssh -f new_hosts_file 'echo gpadmin_password | passwd gpadmin --stdin'
- Verify the gpadmin user has been created by looking for
its home
directory:
# gpssh -f new_hosts_file ls -l /home
To exchange SSH keys as the gpadmin user
- Log in as gpadmin and run the
gpssh-exkeys utility referencing the host list files. For
example:
# gpssh-exkeys -e /home/gpadmin/existing_hosts_file -x /home/gpadmin/new_hosts_file
-
gpssh-exkeys will check the remote hosts and perform
the key exchange between all hosts. Enter the gpadmin user password
when prompted. For
example:
***Enter password for gpadmin@hostname: <gpadmin_password>
Verifying OS Settings
Use the gpcheck utility to verify all new hosts in your array have the correct OS settings to run Greenplum Database software.
To run gpcheck
- Log in on the master host as the user who will run your Greenplum
Database system (for example,
gpadmin).
$ su - gpadmin
- Run the gpcheck utility using your host file for new
hosts. For example:
$ gpcheck -f new_hosts_file
Validating Disk I/O and Memory Bandwidth
Use the gpcheckperf utility to test disk I/O and memory bandwidth.
To run gpcheckperf
- Run the gpcheckperf utility using the host file for new
hosts. Use the -d option to specify the file systems you want to test
on each host. You must have write access to these directories. For
example:
$ gpcheckperf -f new_hosts_file -d /data1 -d /data2 -v
- The utility may take a long time to perform the tests because it is copying very large files between the hosts. When it is finished, you will see the summary results for the Disk Write, Disk Read, and Stream tests.
For a network divided into subnets, repeat this procedure with a separate host file for each subnet.
Integrating New Hardware into the System
Before initializing the system with the new segments, shut down the system with gpstop to prevent user activity from skewing performance test results. Then, repeat the performance tests using host files that include all nodes, existing and new: