Table of Contents
- 1. Introduction
- 2. Building and Installing
- 3. Configuring
- 3.1. Typical Configuration
- 3.2. Non-default Configuration
- 3.3. Locating configuration files
- 3.4. Web service deployment configuration
- 3.5. JNDI application configuration
- 3.6. Security descriptor
- 3.7. GRAM and GridFTP file system mapping
- 3.8. Scheduler-Specific Configuration Files
- 3.9. WS GRAM auto-registration with default WS MDS Index Service
- 3.10. Registering WS GRAM manually with default WS MDS Index Service
- 3.11. Configuring support for SoftEnv
- 3.12. Job Description Document Substitution Variables
- 3.13. Audit Logging
- 4. Deploying
- 5. Job Description Extensions Support
- 6. Testing
- 7. Security Considerations
- 8. Troubleshooting
- 9. Usage statistics collection by the Globus Alliance
This guide contains advanced configuration information for system administrators working with WS GRAM. It provides references to information on procedures typically performed by system administrators, including installation, configuring, deploying, and testing the installation. It also describes additional prerequisites and host settings necessary for WS GRAM operation. Readers should be familiar with the Key Concepts and Implementation Approach for WS GRAM to understand the motivation for and interaction between the various deployed components.
![]() | Important |
---|---|
The information in this WS GRAM Admin Guide is in addition to the basic Globus Toolkit prerequisite, overview, installation, security configuration instructions in the GT 4.0 System Administrator's Guide. Read through this guide before continuing! |
WS GRAM is built and installed as part of a default GT 4.0 installation. For basic installation instructions, see the GT 4.0 System Administrator's Guide.
In order to use WS GRAM, the container must be started with Transport Level
security. The "-nosec" option should *not* be used with
globus-start-container
.
WS GRAM requires that the sudo command is installed and functioning on the service host where WS GRAM software will execute.
Authorization rules will need to be added to the
sudoers
file to allow the WS GRAM service
account to execute (without a password) the scheduler adapter in the
accounts of authorized GRAM users. For configuration details, see the
Configuring sudo section.
Platform Note: On AIX, sudo is not installed by default, but it is available as source and rpm here: AIX 5L Toolbox for Linux Applications
WS GRAM depends on a local mechanism for starting and controlling jobs. Included in the WS GRAM software is a Fork scheduler, which requires no special software installed to execute jobs on the local host. However, to enable WS GRAM to execute and manage jobs to a batch scheduler, the scheduler software must be installed and configured prior to configuring WS GRAM.
WS GRAM depends on scheduler adapters to translate the WS GRAM job description document into commands understood by the local scheduler, as well as monitor the jobs.
Scheduler adapters included in the GT 4.0 release are: PBS, Condor, LSF
Other third party scheduler adapters available for GT 4.0.x releases:
- Sun Grid Engine
- LoadLeveler - as of release 3.3.1 IBM LoadLeveler includes a GRAM Scheduler Adapter. For more information see "What's new" in the LoadLeveler product documentation
- GridWay - installation and configuration guide is here
For configuration details, see the Configuring scheduler adapters section.
Though staging directives are processed by RFT (see next section), RFT uses GridFTP servers underneath to do the actual data movement. As a result, there must be at least one GridFTP server that shares a file system with the execution nodes. There is no special process to get staged files onto the execution node before the job executable is run. See the Non-default GridFTP server section of this admin guide for details on how to configure WS GRAM for your GridFTP servers used in your execution environment.
WS GRAM depends on RFT to perform file staging and cleanup directives in a job description. For configuration details, see the RFT admin guide Important: Jobs requesting these functions will fail if RFT is not properly setup.
When the credentials of the service account and the job submitter are different (multi user mode), then GRAM will prepend a call to sudo to the local adapter callout command. Important: If sudo is not configured properly, the command and thus job will fail.
As root, here are the two lines to add to the /etc/sudoers file for each GLOBUS_LOCATION installation, where /opt/globus/GT4.0.0 should be replaced with the GLOBUS LOCATION for your installation:
# Globus GRAM entries globus ALL=(username1,username2) NOPASSWD: /opt/globus/GT4.0.0/libexec/globus-gridmap-and-execute -g /etc/grid-security/grid-mapfile /opt/globus/GT4.0.0/libexec/globus-job-manager-script.pl * globus ALL=(username1,username2) NOPASSWD: /opt/globus/GT4.0.0/libexec/globus-gridmap-and-execute -g /etc/grid-security/grid-mapfile /opt/globus/GT4.0.0/libexec/globus-gram-local-proxy-tool *
The globus-gridmap-and-execute
program is
used to ensure that GRAM only runs programs under accounts that are in the
grid-mapfile. In the sudo configuration, it is the first program called.
It looks up the account in the grid-mapfile and then runs the requested
command. It is redundant if sudo is properly locked down. This tool
could be replaced with your own authorization program.
The WS GRAM scheduler adapters included in the release tarball are: PBS, Condor and LSF. To install, follow these steps (shown for pbs):
% cd $GLOBUS_LOCATION\gt4.0.0-all-source-installer % make gt4-gram-pbs % make install
Using PBS as the example, make sure the scheduler commands are in your path (qsub, qstat, pbsnodes).
For PBS, another setup step is required to configure the remote shell for rsh access:
% cd $GLOBUS_LOCATION/setup/globus % ./setup-globus-job-manager-pbs --remote-shell=rsh
The last thing is to define the GRAM and GridFTP file system mapping for PBS. A default mapping in this file is created to allow simple jobs to run. However, the actual file system mappings for your compute resource should be entered to ensure:
files staging is performed correctly
jobs with erroneous file path directives are rejected
Done! You have added the PBS scheduler adapters to your GT installation.
Note for future GT builds with scheduler adapters: scheduler adapters can be enabled by adding --enable-wsgram-pbs to the configure line when building the entire toolkit.
% configure --prefix=$GLOBUS_LOCATION --enable-wsgram-pbs ... % make % make install
To run the container using just a user proxy, instead of host creds,
edit the
$GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml
file, and either comment out the credentials section...
<?xml version="1.0" encoding="UTF-8"?> <securityConfig xmlns="http://www.globus.org"> <!-- <credential> <key-file value="/etc/grid-security/containerkey.pem"/> <cert-file value="/etc/grid-security/containercert.pem"/> <credential> --> <gridmap value="/etc/grid-security/grid-mapfile"/> <securityConfig>
or replace the credentials section with a proxy file location...
<?xml version="1.0" encoding="UTF-8"?> <securityConfig xmlns="http://www.globus.org"> <proxy-file value="<PATH TO PROXY FILE>"/> <gridmap value="/etc/grid-security/grid-mapfile"/> <securityConfig>
Running in personal mode (user proxy), another GRAM configuration setting is required. For GRAM to authorize the RFT service when performing staging functions, it needs to know the subject DN for verification. Here are the steps:
% cd $GLOBUS_LOCATION/setup/globus % ./setup-gram-service-common --staging-subject= "/DC=org/DC=doegrids/OU=People/CN=Stuart Martin 564720"
You can get your subject DN by running this command:
% grid-cert-info -subject
By default, the GridFTP server is assumed to run as root on localhost:2811.
If this is not true for your site then change it by editing the GridFTP host
and/or port in the GRAM and GridFTP file system mapping config file: $GLOBUS_LOCATION/etc/gram-service/globus_gram_fs_map_config.xml
.
By default, the globus services will assume 8443 is the port the Globus container is using. However the container can be run under a non-standard port, for example:
% globus-start-container -p 4321
If you wish to specify a non-standard gridmap file in a multi-user installation, two basic configurations need to be changed:
$GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml
- As specified in the gridmap config instructions, add a <gridmap value="..."/> element to the file appropriately.
/etc/sudoers
Change the file path after all -g options
-g /path/to/grid-mapfile
.
Example: global_security_descriptor.xml
... <gridmap value="/opt/grid-mapfile"/> ...
sudoers
... # Globus GRAM entries globus ALL=(username1,username2) NOPASSWD: /opt/globus/GT4.0.0/libexec/globus-gridmap-and-execute -g /opt/grid-mapfile /opt/globus/GT4.0.0/libexec/globus-job-manager-script.pl * globus ALL=(username1,username2) NOPASSWD: /opt/globus/GT4.0.0/libexec/globus-gridmap-and-execute -g /opt/grid-mapfile /opt/globus/GT4.0.0/libexec/globus-gram-local-proxy-tool * ...
RFT is used by GRAM to stage files in and out of the job execution environment. In the default configuration, RFT is hosted in the same container as GRAM and is assumed to have the same service path and standard service names. This need not be the case. For example, the most likely alternative scenario is that RFT would be hosted seperately in a container on a different machine. In any case, both the RFT and the Delegation Service endpoints need to be adjustable to allow this flexibility. The following options can be passed to the setup-gram-service-common script to affect these settings:
--staging-protocol=<protocol> --staging-host=<host> --staging-port=<port> --staging-service-path=<RFT and Delegation factory service path> --staging-factory-name=<RFT factory service name> --staging-delegation-factory-name=<name of Delegation factory service used by RFT>
for example
% setup-gram-service-common \ --staging-protocol=http --staging-host=somemachine.fakedomain.net --staging-port=8444 --staging-service-path=/tomcat/services/ --staging-factory-name=MyReliableFileTransferFactoryService --staging-delegation-factory-name=MyDelegationFactoryServiceForRFT
will internally cause the GRAM service code to construct the following EPR addresses:
http://somemachine.fakedomain.net:8444/tomcat/services/MyReliableFileTransferFactoryService http://somemachine.fakedomain.net:8444/tomcat/services/MyDelegationFactoryServiceForRFT
All the GRAM service configuration files are located in subdirectories of
the $GLOBUS_LOCATION/etc
directory. The names of the GRAM
configuration directories all start with gram-service
.
For instance, with a default GRAM installation, the command line:
% ls etc | grep gram-service
gives the following output:
gram-service gram-service-Fork gram-service-Multi
The file $GLOBUS_LOCATION/etc/gram-service/server-config.wsdd
contains information necessary to deploy and instantiate the GRAM
services in the Globus container.
Three GRAM services are deployed:
- ManagedExecutableJobService: service invoked when querying or managing an executable job
- ManagedMultiJobService: service invoked when querying or managing a multijob
- ManagedJobFactoryService: service invoked when submitting a job
Each service deployment information contains the name of the Java service implementation class, the path to the WSDL service file, the name of the operation providers that the service reuses for its implementation of WSDL-defined operations, etc. More information about the service deployment configuration information can be found here.
The configuration of WSRF resources and application-level service configuration not related to service deployment is contained in JNDI files. The JNDI-based GRAM configuration is of two kinds:
The file $GLOBUS_LOCATION/etc/gram-service/jndi-config.xml
contains
configuration information that is common to every local resource manager.
More precisely, the configuration data it contains pertains to the implementation of the GRAM WSRF resources (factory resources and job resources), as well as initial values of WSRF resource properties that are always published by any Managed Job Factory WSRF resource.
The data is categorized by service, because according to WSRF, in spite of the service/resource separation of concern, a given service will use only one XML Schema type of resource. In practice it is therefore clearer to categorize the configuration resource implementation by service, even if theoretically speaking a given resource implementation could be used by several services. For more information, refer to the Java WS Core documentation.
Here is the decomposition, in JNDI objects, of the common configuration
data, categorized by service. Each XYZHome object contains the same
Globus Core-defined information for the implementation of the WSRF resource,
such as the Java implementation class for the resource (resourceClass
datum),
the Java class for the resource key (resourceKeyType
datum), etc.
ManagedExecutableJobService
- ManagedExecutableJobHome: configuration of the implementation of resources for the service.
ManagedMultiJobService
- ManagedMultiJobHome: configuration of the implementation of resources for the service
ManagedJobFactoryService
- FactoryServiceConfiguration: this encapsulates configuration information used by the factory service. Currently this identifies the service to associate to a newly created job resource in order to create an endpoint reference and return it.
- ManagedJobFactoryHome: implementation of resources for the service resourceClass
- FactoryHomeConfiguration: this contains GRAM application-level configuration data i.e. values for resource properties common to all factory resources. For instance, the path to the Globus installation, host information such as CPU type, manufacturer, operating system name and version, etc.
When a SOAP call is made to a GRAM factory service in order to submit a job, the call is actually made to a GRAM service-resource pair, where the factory resource represents the local resource manager to be used to execute the job.
There is one directory gram-service-<manager>/
for each local resource manager supported by the GRAM installation.
For instance, let's assume the command line:
% ls etc | grep gram-service-
gives the following output:
gram-service-Fork gram-service-LSF gram-service-Multi
In this example, the Multi, Fork and LSF job factory resources have been
installed. Multi
is a special kind of local resource manager
which enables the GRAM services to support multijobs.
The JNDI configuration file located under each manager directory contains configuration information for the GRAM support of the given local resource manager, such as the name that GRAM uses to designate the given resource manager. This is referred to as the GRAM name of the local resource manager.
For instance, $GLOBUS_LOCATION/etc/gram-service-Fork/jndi-config.xml
contains the following XML element structure:
<service name="ManagedJobFactoryService"> <!-- LRM configuration: Fork --> <resource name="ForkResourceConfiguration" type="org.globus.exec.service.factory.FactoryResourceConfiguration"> <resourceParams> [...] <parameter> <name> localResourceManagerName </name> <value> Fork </value> </parameter> <!-- Site-specific scratchDir Default: ${GLOBUS_USER_HOME}/.globus/scratch <parameter> <name> scratchDirectory </name> <value> ${GLOBUS_USER_HOME}.globus/scratch </value> </parameter> --> </resourceParams> </resource> </service>
In the example above, the name of the local resource manager is
Fork
. This value can be used with the GRAM
command line client in order to specify which factory resource to use when
submitting a job. Similarly, it is used to create an endpoint reference to the
chosen factory WS-Resource when using the GRAM client API.
In the example above, the scratchDirectory is set to
${GLOBUS_USER_HOME}/.globus/scratch
. This is
the default setting. It can be configured to point to an alternate file system
path that is common to the compute cluster and is typically less reliable (auto
purging), while offering a greater amount of disk space (thus "scratch").
The file $GLOBUS_LOCATION/etc/gram-service/managed-job-factory-security-config.xml
contains the Core security configuration for the GRAM ManagedJobFactory
service:
default security information for all remote invocations, such as:
- the authorization method, based on a Gridmap file (in order to resolve user credentials to local user names)
- limited proxy credentials will be rejected
- security information for the
createManagedJob
operation
The file $GLOBUS_LOCATION/etc/gram-service/managed-job-security-config.xml
contains the Core security configuration for the GRAM job resources:
- The default is to only allow the identity that called the createManagedJob operation to access the resource.
Note, that by default two gridmap checks are done during a invocation of WS-GRAM:
One gridmap check is be done by the container as configured by the gridmap element in
$GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml
Another check is done by WS-GRAM when it calls the Perl modules which are used for job submission to the underlying local resource manager, as configured by the
authz
element which is by default set to gridmap in$GLOBUS_LOCATION/etc/gram-service/managed-job-factory-security-config.xml
and$GLOBUS_LOCATION/etc/gram-service/managed-job-security-config.xml
. This check is done for additional security reasons to make sure that a potentially hacked globus user account still only can act on behalf of the users which are defined in a grid-mapfile.
The second gridmap check can be avoided by removing the authz
element from
both WS-GRAM security descriptors. This however does not mean, that no authorization check is done. The container
still checks if the client is authorized as defined in
$GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml
but there's
no further authorization check before calling the Perl modules. It's up to the GT4 container administrator to
decide whether he wants to have that additional authorization check or not.
Note: GRAM does not override the container security credentials defined in $GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml
. These are the credentials used to authenticate all service requests.
The file $GLOBUS_LOCATION/etc/gram-service/globus_gram_fs_map_config.xml
contains information to associate local resource managers with GridFTP servers. GRAM uses the GridFTP server (via RFT) to perform all file staging directives.
Since the GridFTP server and the Globus service container can be run on separate hosts, a mapping is needed between the common file system paths of these 2 hosts.
This enables the GRAM services to resolve file:/// staging directives to the local GridFTP URLs.
Below is the default Fork entry. Mapping a jobPath of / to ftpPath of / will allow any file staging directive to be attempted.
<map> <scheduler>Fork</scheduler> <ftpServer> <protocol>gsiftp</protocol> <host>myhost.org</host> <port>2811</port> </ftpServer> <mapping> <jobPath>/</jobPath> <ftpPath>/</ftpPath> </mapping> </map>
For a scheduler, where jobs will typically run on a compute node, a default entry is not provided. This means staging directives will fail until a mapping is entered. Here is an example of a compute cluster with PBS installed that has 2 common mount points between the front end host and the GridFTP server host.
<map> <scheduler>PBS</scheduler> <ftpServer> <protocol>gsiftp</protocol> <host>myhost.org</host> <port>2811</port> </ftpServer> <mapping> <jobPath>/pvfs/mount1/users</jobPath> <ftpPath>/pvfs/mount2/users</ftpPath> </mapping> <mapping> <jobPath>/pvfs/jobhome</jobPath> <ftpPath>/pvfs/ftphome</ftpPath> </mapping> </map>
The file system mapping schema doc is here.
In addition to the service configuration described above, there are scheduler-specific configuration files for the Scheduler Event Generator modules. These files consist of name=value pairs separated by newlines. These files are:
Table 1. Scheduler-Specific Configuration Files
$GLOBUS_LOCATION/etc/globus-fork.conf | Configuration for the Fork SEG module implementation. The attributes names for this file are:
|
$GLOBUS_LOCATION/etc/globus-condor.conf | Configuration for the Condor SEG module implementation. The attributes names for this file are:
|
$GLOBUS_LOCATION/etc/globus-pbs.conf | Configuration for the PBS SEG module implementation. The attributes names for this file are:
|
$GLOBUS_LOCATION/etc/globus-lsf.conf | Configuration for the LSF SEG module implementation. The attributes names for this file are:
|
With a default GT 4.0.1 installation, the WS GRAM service is automatically registered with the default WS MDS Index Service running in the same container for monitoring and discovery purposes.
![]() | Note |
---|---|
If you are using GT 4.0.0, we strongly recommend upgrading to 4.0.1 to take advantage of this capability. |
However, if must use GT 4.0.0, or if this registration was turned off and you want to turn it back on, this is how it is configured:
There is a jndi resource defined in $GLOBUS_LOCATION/etc/gram-service/jndi-config.xml
as follows :
<resource name="mdsConfiguration" type="org.globus.wsrf.impl.servicegroup.client.MDSConfiguration"> <resourceParams> <parameter> <name>reg</name> <value>true</value> </parameter> <parameter> <name>factory</name> <value>org.globus.wsrf.jndi.BeanFactory</value> </parameter> </resourceParams> </resource>
To configure the automatic registration of WS GRAM to the default WS MDS Index Service, change the value of the parameter
<reg>
as follows:
true
turns on auto-registration; this is the default in GT 4.0.1.false
turns off auto-registration; this is the default in GT 4.0.0.
By default, the GLUECE:
resource property (which contains GLUE data) is sent to the default Index Service:
You can configure which resource properties are sent in WS GRAM's registration.xml file, $GLOBUS_LOCATION/etc/gram-service/registration.xml
.
The following is the relevant section of the file (as it is set by default):
<Content xsi:type="agg:AggregatorContent" xmlns:agg="http://mds.globus.org/aggregator/types"> <agg:AggregatorConfig xsi:type="agg:AggregatorConfig"> <agg:GetResourcePropertyPollType xmlns:glue="http://mds.globus.org/glue/ce/1.1"> <!-- Specifies that the index should refresh information every 60000 milliseconds (once per minute) --> <agg:PollIntervalMillis>60000</agg:PollIntervalMillis> <!-- specifies the resource property that should be aggregated, which in this case is the GLUE cluster and scheduler information RP --> <agg:ResourcePropertyName>glue:GLUECE</agg:ResourcePropertyName> </agg:GetResourcePropertyPollType> </agg:AggregatorConfig> <agg:AggregatorData/> </Content>
If a third party needs to register an WS GRAM service manually, see Registering with mds-servicegroup-add in the WS MDS Aggregator Framework documentation.
Note: This feature is only available beginning from version 4.0.5 of the toolkit.
SoftEnv is a system designed to make it easier for users to define what applications they want to use, and easier for administrators to make applications available to users. SoftEnv has evolved from the original implementation called Soft designed at Northeastern University in 1994.
In some environments like TeraGrid it's desirable to make use of SoftEnv before a job is submitted to leverage the use of an exactely defined software environment the job will run in.
Because this feature is very specific and may not be available on many systems, support for SoftEnv is disabled by default in normal job submissions. There is a parameter in the JNDI configuration of WS GRAM to enable SoftEnv support in job submissions.
SoftEnv support must be enabled on a per-scheduler basis because the internal mechanisms to support SoftEnv vary between the different types of schedulers. Currently only the scheduler Fork, PBS and LSF can be configured to have SoftEnv support enabled, Condor not yet. To enable this feature the parameter 'enableDefaultSoftwareEnvironment' in the scheduler specific JNDI configuration must be set to 'true'. For example to enable SoftEnv support in the Fork scheduler, set the 'enableDefaultSoftwareEnvironment' in $GLOBUS_LOCATION/etc/gram-service-Fork/jndi-config.xml to 'true'.
Enabled SoftEnv support means that a users default environment will be
created from his .soft
file before each
job submission automatically. The user doesn't need to provide extra SoftEnv
keys in the extensions element of a job description. This is not done if the
SoftEnv feature is disabled.
For more information and examples, please look in the SoftEnv section of the user guide.
By default only four variables can be used in the job description document which are resolved to values in the service. These are
GLOBUS_USER_HOME
GLOBUS_USER_NAME
GLOBUS_SCRATCH_DIR
GLOBUS_LOCATION
To enable communities to define their own system-wide variables and enable their users to use them in their job descriptions, a new generic variable/value config file was added where these variables can be defined. If a job description document contains one of these variables that file will be used to resolve any matching variables.
A new service parameter in the JNDI container registry defines the path to the variable mapping file. The mapping is done for each scheduler. This file is checked periodically (configurable frequency) to see if it has changed. If so, it is reread and the new content replaces the old.
For example for the scheduler Fork there are the following entries in
$GLOBUS_LOCATION/etc/gram-service-Fork/jndi-config.xml
which can be configured to determine the location and the refresh period
of the variable mapping file:
<parameter> <name> substitutionDefinitionsFile </name> <value> /root/vdt-stuff/globus/etc/gram-service-Condor/substitution definition.properties </value> </parameter> <parameter> <name> substitutionDefinitionsRefreshPeriod </name> <value> <!-- MINUTES --> 480 </value> </parameter>
The use of variables in the job description document that are
not defined in the variable mapping file leads to
the following error during job submission:
'No value found for RSL substitution variable <variableName>'
Note: This feature is only available beginning from version 4.0.5 of the toolkit.
WS-GRAM provides mechanisms to provide access to audit and accounting information associated with jobs that are submitted to local resource manager (LRM) like PBS, LSF, Condor by WS-GRAM. GRAM is not a local resource manager but rather a protocol engine for communicating with a range of different local resource managers using a standard message format. In some scenarios it is desirable to get an overview over the usage of the underlying LRM like
What kind of jobs had been submitted via GRAM?
How long did the processing of a job take?
How many jobs had been submitted by user X?
The following three usecases give a better overview about the meaning and purpose of auditing and accounting:
Group Access. A grid resource provider allows a remoteservice (e.g., a gateway or portal) to submit jobs on behalf of multiple users. The grid resource provider only obtains information about the identity of the remote submitting service and thus does not know the identity of the users for which the grid jobs are submitted. This group access is allowed under the condition that the remote service store audit information so that, if and when needed, the grid resource provider can request and obtain information to track a specific job back to an individual user.
Query Job Accounting. A client that submits a job needs to be able to obtain, after the job has completed, information about the resources consumed by that job. In portal and gateway environments where many users submit many jobs against a single allocation, this per-job accounting information is needed soon after the job completes so that client-side accounting can be updated. Accounting information is sensitive and thus should only be released to authorized parties.
Auditing. In a distributed multi-site environment, it can be necessary to investigate various forms of suspected intrusion and abuse. In such cases, we may need to access an audit trail of the actions performed by a service. When accessing this audit trail, it will frequently be important to be able to relate specific actions to the user.
The audit record of each job is stored in a DBMS and contains
job_grid_id: String representation of the resource EPR
local_job_id: Job/process id generated by the scheduler
subject_name: Distinguished name (DN) of the user
username: Local username
idempotence_id: Job id generated on the client-side
creation_time: Date when the job resource is created
queued_time: Date when the job is submitted to the scheduler
stage_in_grid_id: String representation of the stageIn-EPR (RFT)
stage_out_grid_id: String representation of the stageOut-EPR (RFT)
clean_up_grid_id: String representation of the cleanUp-EPR (RFT)
globus_toolkit_version: Version of the server-side GT
resource_manager_type: Type of the resource manager (Fork, Condor, ...)
job_description: Complete job description document
success_flag: Flag that shows whether the job failed or finished successfully
finished_flag: Flag that shows whether the job is already fully processed or still in progress
While audit and accounting records may be generated and stored by different entities in different contexts, we assume here that audit records are generated by the GRAM service itself and accounting records by the LRM to which the GRAM service submits jobs. Accounting records could contain all information about the duration and the resource-usage of a job. Audit records are stored in a database indexed by a Grid job identifier (GJID), while accounting records are maintained by the LRM indexed by a local job identifier (JID).
GRAM Service GJID creation
The WS-GRAM service returns an EPR that is used to control the job. The EPR is an XML document and cannot effectively be used as a primary key for a database table. It needs to be converted from an EPR to an acceptable GJID format. A utility class EPRUtil.java is included GT releases beginning from version 4.0.5 and can be used by both the GRAM service before storing the audit record and the GRAM client before getting audit information from the audit database.
To connect the two sets of records, both audit and accounting records, we require that GRAM records theJID in each audit record that it generates. It is then straightforward for an audit service to respond to requests like 'give me the charge of the job with JID x' by first selecting matching record(s) from the audit table and then using the local JID(s) to join to the accounting table of the LRM to access relevant accounting record(s).
We propose a Web Service interface for accessing audit and accounting information. OGSA-DAI is a WSRF service that can create a single virtual database from two or more remote databases. In the future, other per-job information like job performance data could be stored using the GJID or local JID as an index, and then made available in the same virtual database. The rest of this chapter focuses on how to configure WS-GRAM to enable Audit-Logging. A case study for TeraGrid can be read here
OGSA-DAI is available here: http://www.globus.org/toolkit/docs/4.0/techpreview/ogsadai/
Audit logging in WS-GRAM is done 3 times in a job's lifecycle: When the processing starts, when the job is submitted to the local resource manager and when it's fully processed or when it fails.
More information about how to use this data to get e.g. accounting information of a job, how to query that audit database for information via a Web Services interface etc. please go here
Add the following lines to the Log4j configuration in
$GLOBUS_LOCATION/etc/container.log4j.properties
to enable audit logging:
# GRAM AUDIT log4j.category.org.globus.exec.service.exec.StateMachine.audit=DEBUG, AUDIT log4j.appender.AUDIT=org.globus.exec.utils.audit.AuditDatabaseAppender log4j.appender.AUDIT.layout=org.apache.log4j.PatternLayout log4j.additivity.org.globus.exec.service.exec.StateMachine.audit=false
Audit records are stored in a database which must be set up once. Currently we provide schemas for
MySQL (
$GLOBUS_LOCATION/share/gram-service/gram_audit_schema_mysql.sql
)Postgres (
$GLOBUS_LOCATION/share/gram-service/gram_audit_schema_postgres-8.0.sql
)
The following describes how to set up the database for audit records in MySQL:
Create a database inside of MySQL
Grant necessary privileges to the user who is configured in the JNDI registry of WS-GRAM
Use the schema to create the table
host:~ feller$ mysql -u root -p Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 16 Server version: 5.0.37 MySQL Community Server (GPL) Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> create database auditDatabase; Query OK, 1 row affected (0.09 sec) mysql> GRANT ALL ON auditDatabase.* to globus@localhost identified by "foo"; Query OK, 0 rows affected (0.32 sec) mysql> exit Bye host:~ feller$ mysql -u globus -p auditDatabase < ${GLOBUS_LOCATION}/share/gram-service/gram_audit_schema_mysql.sql Enter password: host:~ feller$
Add or modify the database configuration where the audit records are
stored in $GLOBUS_LOCATION/etc/gram-service/jndi-config.xml
:
<resource name="auditDatabaseConfiguration" type="org.globus.exec.service.utils.AuditDatabaseConfiguration"> <resourceParams> <parameter> <name>factory</name> <value>org.globus.wsrf.jndi.BeanFactory</value> </parameter> <parameter> <name>driverClass</name> <value>com.mysql.jdbc.Driver</value> </parameter> <parameter> <name>url</name> <value>jdbc:mysql://<host>[:port]/auditDatabase</value> </parameter> <parameter> <name>user</name> <value>globus</value> </parameter> <parameter> <name>password</name> <value>foo</value> </parameter> <parameter> <name>globusVersion</name> <value>4.0.3</value> </parameter> </resourceParams> </resource>
WS GRAM is deployed as part of a standard toolkit installation. Please refer to the GT 4.0 System Administrator's Guide for details.
WS GRAM has been tested to work without any additional setup steps when deployed into Tomcat. Please see the Java WS Core admin guide section on deploying GT4 services into Tomcat for instructions. Also, for details on tested containers, see the WS GRAM release notes.
![]() | Note |
---|---|
Currently only a single deployment is supported because of a limitation in the execution of the Scheduler Event Generator. One must set GLOBUS_LOCATION before starting Tomcat. |
![]() | Note |
---|---|
This feature has been added in GT 4.0.5. For versions older than 4.0.5 an update package is available to upgrade one's installation. See the downloads page for the latest links. |
The WS-GRAM job description schema includes a section for extending the job description with custom elements. To make sense of this in the resource manager adapter Perl scripts, a Perl module named Globus::GRAM::ExtensionsHandler is provided to turn these custom elements into parameters that the adapter scripts can understand.
It should be noted that although non-GRAM XML elements only are allowed
in the <extensions>
element of the job
description, the extensions handler makes no distinction based on namespace.
Thus, <foo:myparam>
and
<bar:myparam>
will both be treated as
just <myparam>
.
Familiarity with the adapter scripts is assumed in the following sub-sections.
Simple string extension elements are converted into single-element arrays with the name of the unqualified tag name of the extension element as the array's key name in the Perl job description hash. Simple string extension elements can be considered a special case of the string array construct in the next section.
For example, adding the following element to the
<extensions>
element of the job
description as follows:
<extensions>
<myparam>yahoo!</myparam>
</extensions>
will cause the $description->myparam()
to
return the following value:
'yahoo!'
String arrays are a simple iteration of the simple string element construct. If you specify more than one simple string element in the job description, these will be assembled into a multi-element array with the unqualified tag name of the extension elements as the array's key name in the Perl job description hash.
For example:
<extensions>
<myparams>Hello</myparams>
<myparams>World!</myparams>
</extensions>
will cause the $description->myparams()
to
return the following value:
[ 'Hello', 'World!' ]
Name/value extension elements can be thought of as string arrays with an XML attribute 'name'. This will cause the creation of a two-dimensional array with the unqualified extension element tag name as the name of the array in the Perl job description hash.
For example:
<extensions>
<myvars name="pi">3.14159</myvars>
<myvars name="mole">6.022 x 10^23</myvars>
</extensions>
will cause the $description->myvars()
to
return the following value:
[ [ 'pi', '3.14159'], ['mole', '6.022 x 10^23'] ]
In addition to the globus_gram_job_manager
update package, the
globus_gram_job_manager_setup_pbs
update package is required to take advantage of the PBS node selection
extensions.
Node selection constraints in PBS can be specified in two ways, generally using a construct intended to eventually apply to all resource managers which support node selection, or explicitly by sepcifiying a simple string element. The former will be more portable, but the later will appeal to those familiar with specifying node constraints for PBS jobs.
To specify PBS node selection constraints explicitly, one can simply constuct
a single, simple string extension element named
nodes
with a value that conforms to the
#PBS -l nodes=...
PBS job description
directive. The Globus::GRAM::ExtensionsHandler module will make this available
to the PBS adapter script by invoking
$description->{nodes}
. The updated PBS
adapter package checks for this value and will create a directive in the PBS
job description using this value.
To use the generic construct for specifying node selection constraints, use the
resourceAllocationGroup
element:
<extensions>
<resourceAllocationGroup>
<!-- Optionally select hosts by type and number... -->
<hostType>...</hostType>
<hostCount>...</hostCount>
<!-- *OR* by host names -->
<hostName>...</hostName>
<hostName>...</hostName>
. . .
<!-- With a total CPU count for this group... -->
<cpuCount>...</cpuCount>
<!-- *OR* an explicit number of CPUs per node... -->
<cpusPerHost>...</cpusPerHost>
. . .
<!-- And a total process count for this group... -->
<processCount>...</processCount>
<!-- *OR* an explicit number of processes per node... -->
<processesPerHost>...</processesPerHost>
</resourceAllocationGroup>
</extensions>
Extension elements specified according to the above pseudo-schema will be
converted to an appropriate nodes
parameter
which will be treated as if an explicit nodes
extension element were specified.
Multiple resourceAllocationGroup
elements
may be specified. This will simply append the constraints to the
nodes
paramater with a '+' separator.
Note that one cannot specify both hostType/hostCount and hostName elements.
Similarly, one cannot specify both processCount and processesPerHost elements.
Here are some examples of using
resourceAllocationGroup
:
<!-- #PBS -l nodes=1:ppn=10 -->
<!-- 10 processes -->
<extensions>
<resourceAllocationGroup>
<cpuCount>10</cpuCount>
<processCount>10</processCount>
</resourceAllocationGroup>
</extensions>
<!-- #PBS -l nodes=activemural:ppn=10+5:ia64-compute:ppn=2 -->
<!-- 1 process (process default) -->
<extensions>
<resourceAllocationGroup>
<hostType>activemural</hostType>
<cpuCount>10</cpuCount>
</resourceAllocationGroup>
<resourceAllocationGroup>
<hostType>ia64-compute</hostType>
<hostCount>5</hostCount>
<cpusPerHost>2</cpusPerHost>
</resourceAllocationGroup>
</extensions>
<!-- #PBS -l nodes=vis001:ppn=5+vis002:ppn=5+comp014:ppn=2+comp015:ppn=2 -->
<!-- 15 total processes -->
<extensions>
<resourceAllocationGroup>
<hostName>vis001</hostName>
<hostName>vis002</hostName>
<cpuCount>10</cpuCount>
<processesPerHost>5</processesPerHost>
</resourceAllocationGroup>
<resourceAllocationGroup>
<hostName>comp014</hostName>
<hostName>comp015</hostName>
<cpusPerHost>2</cpusPerHost>
<processCount>5</processCount>
</resourceAllocationGroup>
</extensions>
Two Perl modules will have to be edited to customize extensions support. The
first is ExtensionsHandler.pm
. This is where
the WS-GRAM job description XML of the
extensions
element is parsed and entries are
added or appended to the Perl job description hash. The second module that needs
to be edited is the particular resource manager adapter module that will use any
new hash entries to either alter it's behavior or create additional parameters
in the resource manager job description.
For starters, this module logs various things to the log file specified in the
logfile
extension element. If you place this
element at the start of the extensions you are creating support for, then you
can look at the specified log file to get some idea of what the handler is
doing. You can add new logging lines by using the
$self->log()
function. This simply takes a
string that gets appended to the log file with a prefix of
"<date string> EXTENSIONS HANDLER:
".
There are three main subroutines that are used to handle parsing events and
process them accordingly:
Char(), StartTag(),
and
EndTag()
.
More handlers can be specified for other specific events when creating the
XML::Parser
instance in
new()
(see the
XML::Parser documentation for details). Descriptions of what the three
main subroutines do currently are layed out bellow. Modify the subroutines as
neccesary to achieve your goal.
Char()
doesn't do anything but collect CDATA
found between the current element start and end tags. You can access the CDATA
for the current element by using
$self->{CDATA}
.
StartTag()
is responsible for collecting the
attributes associated with the element. It also increments the counter which
keeps track of the number of child elements to the current extension element,
and pushes the current element name onto the
@scope
queue for later use.
EndTag()
is used for taking the CDATA collected
by Char() and creating new Perl job description hash entries. This is most
likely where you will need to do most of your work when adding support for new
extension elements. Two useful variables are
$currentScope
and
$parentScope
. These indicate the current
element that is being parsed and the parent of the element being parsed
respectively. This is useful for establishing a context from which to work
from. The @scope
queue is poped at the end of
this subroutine.
There is not much to say here. Each adapter is different. Spend some time
trying to understand what the adapter does and then make and test your changes.
Any new hash entries you created in
ExtensionsHandler.pm
can be accessed by calling
$description->entryname()
, where 'entryname' is
the name of the entry that was added. See the construct documentation above
for more details.
See the WS GRAM User's Guide for information about submitting a test job.
When I submit a streaming or staging job, I get the following error: ERROR service.TransfreWork Terminal transfer error: [Caused by: Authentication failed[Caused by: Operation unauthorized(Mechanism le vel: Authorization failed. Expected"/CN=host/localhost.localdomain" target but r eceived "/O=Grid/OU=GlobusTest/OU=simpleCA-my.machine.com/CN=host/my.machine.com ")
- Check $GLOBUS_LOCATION/etc/gram-service/globus_gram_fs_map_config.xml for the use of "localhost" or "127.0.0.1" instead of the public hostname (in the example above, "my.machine.com"). Change these uses of the loopback hostname or IP to the public hostname as neccessary.
Fork jobs work fine, but submitting PBS jobs with globusrun-ws hangs at "Current job state: Unsubmitted"
- Make sure the the log_path in $GLOBUS_LOCATION/etc/globus-pbs.conf points to locally accessible scheduler logs that are readable by the user running the container. The Scheduler Event Generator (SEG) will not work without local scheduler logs to monitor. This can also apply to other resource managers, but is most comonly seen with PBS.
- If the SEG configuration looks sane, try running the SEG tests. They are located in $GLOBUS_LOCATION/test/globus_scheduler_event_generator_*_test/. If Fork jobs work, you only need to run the PBS test. Run each test by going to the associated directory and run ./TESTS.pl. If any tests fail, report this to the [email protected] mailing list.
- If the SEG tests succeed, the next step is to figure out the ID assigned by PBS to the queued job. Enable GRAM debug logging by uncommenting the appropriate line in the $GLOBUS_LOCATION/container-log4j.properties configuration file. Restart the container, run a PBS job, and search the container log for a line that contains "Received local job ID" to obtain the local job ID.
- Once you have the local job ID you can check the latest PBS logs pointed to by the value of "log_path" in $GLOBUS_LOCATION/etc/globus-pbs.conf to make sure the job's status is being logged. If the status is not being logged, check the documentation for your flavor of PBS to see if there's any futher configuration that needs to be done to enable job status logging. For example, PBS Pro requires a sufficient -e <bitmask> option added to the pbs_server command line to enable enough logging to satisfy the SEG.
- If the correct status is being logged, try running the
SEG manually to see if it is reading the log file properly. The general
form of the SEG command line is as follows:
$GLOBUS_LOCATION/libexec/globus-scheduler-event-generator -s pbs -t <timestamp>
The timestamp is in seconds since the epoch and dictates how far back in the log history the SEG should scan for job status events. The command should hang after dumping some status data to stdout. If no data appears, change the timestamp to an earlier time. If nothing ever appears, report this to the [email protected] mailing list. - If running the SEG manually succeeds, try running another job and make sure the job process actually finishes and PBS has logged the correct status before giving up and cancelling globusrun-ws. If things are still not working, report your problem and exactly what you have tried to remedy the situtation to the [email protected] mailing list.
The job manager detected an invalid script response
- Check for a restrictive umask. When the service writes the native scheduler job description to a file, an overly restrictive umask will cause the permissions on the file to be such that the submission script run through sudo as the user cannot read the file (bug #2655).
When restarting the container, I get the following error: Error getting delegation resource
- Most likely this is simply a case of the delegated credential expiring. Either refresh it for the affected job or destroy the job resource.
The user's home directory has not been determined correctly
This occurs when the administrator changed the location of the users's home directory and did not restart the GT4 container afterwards. Beginning from version 4.0.3 of the GT, WS-GRAM determines a user's home directory only once in the lifetime of a container (when the user submits the first job). Subsequently submitted jobs will use the cached home directory during job execution.
The following usage statistics are sent by default in a UDP packet (in addition to the GRAM component code, packet version, timestamp, and source IP address) at the end of each job (i.e. when Done or Failed state is entered).
- job creation timestamp (helps determine the rate at which jobs are submitted)
- scheduler type (Fork, PBS, LSF, Condor, etc...)
- jobCredentialEndpoint present in RSL flag (to determine if server-side user proxies are being used)
- fileStageIn present in RSL flag (to determine if the staging in of files is used)
- fileStageOut present in RSL flag (to determine if the staging out of files is used)
- fileCleanUp present in RSL flag (to determine if the cleaning up of files is used)
- CleanUp-Hold requested flag (to determine if streaming is being used)
- job type (Single, Multiple, MPI, or Condor)
- gt2 error code if job failed (to determine common scheduler script errors users experience)
- fault class name if job failed (to determine general classes of common faults users experience)
If you wish to disable this feature, please see the Java WS Core System Administrator's Guide section on Usage Statistics Configuration for instructions.
Also, please see our policy statement on the collection of usage statistics.