GT 4.0: System Administrator's Guide

1. Introduction

This guide contains advanced configuration information for system administrators working with the Community Scheduler Framework (CSF). It provides references to information on procedures typically performed by system administrators, including installation, configuring, deploying, and testing the installation.

[Important]Important

This information is in addition to the basic Globus Toolkit prerequisite, overview, installation, security configuration instructions in the GT 4.0 System Administrator's Guide. Read through this guide before continuing!

2. Building and installing

If you are going to submit jobs to a local batch scheduler, make sure that the scheduler is installed according to the instructions provided with the scheduling software. For LSF, you will need to install and run the gabd service that is used by CSF's RM Adapter to communicate with LSF clusters. The gabd is part of the LSF WebGUI package, or can be acquired by contacting Platform Computing's support department.

Make sure that there is a full installation of the Globus Toolkit as detailed in the GT 4.0 System Administrator's Guide). If you are going to run jobs through the WS GRAM interface, make sure and follow all the instructions for setting up WS GRAM, including running the test cases.

CSF is distributed in a gzipped tarfile that can be installed using gpt-build. Since CSF is built from java source, you must make sure that you have an environment set up properly so that you can build GT 4.0 services.

Set up your GT 4.0 build environment as follows:

  • Make sure JAVA_HOME and ANT_HOME are set to the correct installation locations.
  • Add JAVA_HOME/bin and ANT_HOME/bin to your PATH environment variable.
  • Set GLOBUS_LOCATION to the top level directory of your GT 4.0 installation.
  • Source either globus-user-env.csh or globus-user-env.sh depending on your shell.

In order to install CSF, do the following as the Globus container user (i.e. as the user who owns the Globus installation in GLOBUS_LOCATION) in the directory containing the CSF contribution (under contrib/ in the installer directory):

globus$ gpt-build csf-4.0-src.tar.gz
globus$ gpt-postinstall

3. Configuring

In order to make CSF operational, you must configure the following files:

  • resourcemanager-config.xml
  • metascheduler-config.xml

The files contain some commented out settings as examples.

3.1. Configuring the Resource Manager Factory Service

Edit $GLOBUS_LOCATION/etc/metascheduler/resourcemanager-config.xml and specify a "cluster" section for each resource manager you will be accessing through the RM Adapter.

You must define the following elements:

  • name: the name by which this cluster can be referenced.
  • type: the type of resource manager. Currently only supports type "LSF".
  • host: the host where the resource manager can be contacted. For LSF, this is the host which is running the gabd.
  • port: the port number where the resource manager can be contacted. For LSF, this is specified in $LSF_ENVDIR/ga.conf.

3.2. Configuring CSF

Edit $GLOBUS_LOCATION/etc/metascheduler/metascheduler-config.xml and specify:

  • GISHandle: the endpoint of the local Index Service.
  • registryHandle: the endpoint of the local container registry.
[Note]Note

Do not use 'localhost' or 127.0.0.1 within the endpoint. Use the actual host IP address or fully qualified host name.

3.3.  Configuring the Queuing Service

Edit $GLOBUS_LOCATION/etc/metascheduler/metascheduler-config.xml in the section "queueConfig". By default, there are no queues configured in CSF. Any job submission to a queue will fail. Each queue has its own configuration section, in which you can specify:

  • name: this attribute on the queueConfig element specifies the name by which this queue is referenced.
  • plugin: the name of the class that implements the com.platform.metascheduler.impl.schedPlugin interface. The default plugin (schedPluginDefault) is always loaded for a queue regardless whether it is defined in the queue configuration. If the plugin doesn't exist or does not implement the schedPlugin interface, it will not be loaded. You can specify as many plugins for a queue as you wish.
  • scheduleInterval: the interval in seconds between scheduling cycles. Its value is and integer between 5 and 600. This parameter is optional, and if not defined will default to 30 seconds. This is a parameter of schedPluginDefault.
  • throttle: this is a parameter for the throttle scheduler plugin (com.platform.metascheduler.impl.schedThrottle), and sets the maximum number of jobs that can be dispatched to a back end resource manager in each scheduling cycle. The value is an integer greater than 0.

3.4. Support for multiple GT 4.0 hosting environments

CSF supports the management of jobs and reservations across resource managers hosted in multiple GT 4.0 containers which are part of the same Virtual Organization (VO) (i.e. they trust the same CAs). The multiple container support allows CSF services running in one container to send jobs to the Resource Manager Adapters in another container,etc. In order to support this behaviour, there needs to be a central Index Service for storing information about the different RM Adapters in the VO. This Index Service must be hosted in a container that doesn't host the CSF services.

In order to set up this configuration, the following steps should be taken:

  • On the hosts running the CSF services for the VO (1 or more hosts), edit $GLOBUS_LOCATION/etc/metascheduler/metascheduler-config.xml and set the endpoint of the central Index Service for the VO in:

    • The ReservationConfig section, in the CommunityGISHandle element.
    • The queueConfig section for each queue you want to use remote RM Adapters, in the communityGisHandle parameter.

  • On the host running the central Index Service, edit $GLOBUS_LOCATION/etc/globus_wsrf_mds_index/hierarchy.xml, and configure the location of the Index Services running in the containers running CSF as downstream elements.

4. Deploying

There are no further deployment instructions other than what is detail in the GT 4.0 System Administrator's Guide.

5. Testing

There are currently no tests for Platform CSF.

6. Security considerations

No special security considerations exist at this time.

7. Troubleshooting

7.1. Cannot create a queue

If you can't create a queue, check out if the queue is configured in $GLOBUS_LOCATION/etc/metascheduler/metascheduler-config.xml.

7.2.  RM Adapter cannot connect to resource manager

If the container log shows messages that the RM Adapter can't connect to the back end resource manager (i.e. LSF), check that LSF daemons are started and that the LSF gabd is also running. Also check that the host name and port number in $GLOBUS_LOCATION/etc/metascheduler/resourcemanager-config.xml corresponds to the settings in ga.conf.