GT 4.0 Migrating Guide for WS GRAM

The following provides available information about migrating from previous versions of the Globus Toolkit.

1. Migrating from GT2

1.1. WS GRAM - GT2 Migration Guide

1.1.1. Installation / Deployment Differences

In pre-WS GRAM, jobs are submitted to a job manager process started by a Gatekeeper process. The Gatekeeper process is typically started out by an inetd server, which forks a new gatekeeper per job. In WS GRAM, jobs are started by the ManagedExecutionJobService, which is a Java service implementation running within the globus service container.

The gatekeeper searches the $GLOBUS_LOCATION/etc/grid-services directory to determine which services it will start. Typically there is one job manager service entry file in that directory per scheduler type.

1.1.2. Security Differences

1.1.2.1. Proxies and Delegation

In pre-WS GRAM, the GRAM client is required to delegate a proxy credential to the Gatekeeper so that the job manager can send authenticated job state change messages.

In WS GRAM, delegation is done as needed using the DelegationFactoryService. Jobs may be passed references to delegated credentials as part of the job description.

1.1.2.2. Network Communication

In pre-WS gram, communication between the client and gatekeeper is done using GSI-wrapped messages. Communication between the the client and job manager are sent using SSL. The job manager uses the delegated credential for file streaming or staging as well. Mutual authentication is done on all connections. All communications consist of a single request-response pattern. Network connections and security contexts are never cached between messages.

In WS GRAM, communication may be secured using TLS, ws secure messaging, or ws secure conversation, depending on service configuration. When doing authentication, the service will use the credentials of the container, or for secure message or conversation, a service-specific credential. It will not use a delegated credential when communicating with the client.

1.1.2.3. Root / Local Account Access

The gatekeeper process is started as a root service out of inetd. It then uses the grid-mapfile decide which local user it should setuid() to before starting the job manager process, based on the credential used to submit the job request. The user may optionally propose a non-default user-id by specifying it in the gatekeeper contact string. The job manager process runs entirely under the local user account.

In WS GRAM, the job management service runs within a container shared with other services. The container is run under a non-privileged account. All commands which need to be run as a particular user (such as interactions with the scheduler) are started via sudo. Authorization is done via the globus-gridmap-and-execute program.

1.1.3. Scheduler Interaction Differences

In pre-WS gram, all file system and scheduler interactions occur within a perl module called by the globus-job-manager-script.pl program. Scheduler-specific perl modules implement a number of methods which are used by the job manager:

  • submit
  • poll
  • cancel
  • signal
  • make_scratchdir
  • remove_scratchdir
  • stage_in
  • stage_out
  • cache_cleanup
  • remote_io_file_create
  • proxy_relocate
  • proxy_update

Only a small set of these script methods are used in the WS GRAM implementation. The subset used is:

  • submit
  • poll (called only once per job and only for fork/condor jobs to merge output)
  • cancel
  • cache_cleanup

Some of the functionality has been moved into other services for reliability or performance reasons. Other functions have been removed altogether.

  • poll: SEG
  • signal: dropped
  • make_scratchdir: rft
  • remove_scratchdir: rft
  • stage_in: rft
  • stage_out: rft
  • remote_io_file_create: rft or resource property queries
  • proxy_relocate: delegation service
  • proxy_update: delegation service

1.1.4. Local Node Impact

In pre-WS gram, each job submitted would cause the following processes to be created:

  • gatekeeper (short lived)
  • job manager (lives the duration of the job)
  • perl script (short lived 4 or more instances depending on job type)
  • perl script poll called periodically

In WS GRAM, each job causes the following processes to be created

  • sudo + perl script--(typically 2 times: submit, cache_cleanup)
  • for fork jobs, one fork-starter process (blocked waiting for a signal) for the duration of the job

Additionally, there will be a per-scheduler instance of the SEG program, monitoring a log file for job state changes. Migration from pre-WS GRAM to WS-GRAM

1.2. User - Migration Guide

1.2.1. Command Line Tools

Typical interactions with the pre-WS gram service were done with either the globusrun or command or the globus-job suite of scripts (globus-job-submit, globus-job-run, globus-job-get-output, globus-job-status, globus-job-clean). The main difference between these sets of commands is that globusrun required a job description in RSL format, and the globus-job-submit and globus-job-run scripts would construct that based on command line options.

In WS GRAM, the globusrun-ws command implements the functionality of globusrun using the XML Job Description language in place of the RSL format job description of pre-WS GRAM. It also allows specifying parts of the Job Description with simple command line arguments (for executable and arguments), similar to what one would do with globus-job-run. Like globusrun, the globusrun-ws program supports both the interactive and batch submission of GRAM jobs.

Table 1. Command Line Option Comparison

Descriptionpre-WS GRAM globusrun optionWS GRAM globusrun-ws option
Interactive Multirequest Control-iNO EQUIVALENT
Job Description File Path-f <rsl filename> | -file <rsl filename>-f <filename> | -job-description-file <filename>
Quiet operation-q | -quiet-q | -quiet
File streaming of stdout and stderr *see note 1*-o (Implies -q)-s | -streaming (Implies -q, sometimes -staging-delegate)
Enable embedded GASS Server-s | -server (Implies -o and -q)NO EQUIVALENT
Enable writing to the embedded GASS Server-w | -write-allow (Implies -s and -q)NO EQUIVALENT
Specify Service Contact-r <resource-manager> | -resource <resource-manager> (Specifies Gatekeeper contact)-F, -Ft, or -Ff; Use either factory service contact (-F), Factory Type (-Ft) or Factory EPR file (-Ff)
Do not terminate job when SIGINT is received.-n | -no-interrupt-n | -no-cleanup
Destroy a job based on a job - contact-k <job contact> | -kill <job contact>-kill -j <filename> | -kill -job-epr-file <filename>
Get current job status-status <job contact>-status -j <filename> | -status -job-epr-file <filename>
Batch mode job submission-b | -batch or -F | -fast-batch-batch | -b
Refresh proxy-refresh-proxy <job contact> | -y <job contact>NO EQUIVALENT
Stop a job manager process, saving state-stop-manager <job contact>NO EQUIVALENT
Validate job description without submitting job-p | -parse-validate
Ping job manager-a | -authenticate-onlyNO EQUIVALENT
Dryrun-d | -dryrunNO EQUIVALENT

Note 1: In pre-WS GRAM, streaming is done using https connections from the job manager to a GASS server embedded in the globusrun program. In WS GRAM, streaming is implemented by accessing a gridftp server configured to run along with the service container.

globusrun-ws has additional options to deal with file streaming, monitoring an existing job, authentication and authorization, http timeouts, default termination time, encryption, etc.

1.3. Developer - API and RSL Migration Guide

This table describes the migration path for applications which use the C language interface to pre-WS gram. This table covers the globus_gram_client API.

Table 2. C API Migration Table

GT2 API CommandGT4 API Command
globus_gram_client_callback_allow()globus_notification_create_consumer()
globus_gram_client_register_job_request()ManagedJobFactoryPortType_GetResourceProperty_epr_register()
globus_gram_client_job_request()ManagedJobFactoryPortType_GetResourceProperty_epr()
globus_gram_client_register_job_cancel()ManagedExecutableJobPortType_Destroy_epr_register()
globus_gram_client_job_cancel()ManagedExecutableJobPortType_Destroy_epr()
globus_gram_client_job_status()ManagedExecutableJobPortType_GetResourceProperty_epr() with the property name {http://www.globus.org/namespaces/2004/10/gram/job/types}state
globus_gram_client_register_job_status()ManagedExecutableJobPortType_GetResourceProperty_epr_register() with the property name {http://www.globus.org/namespaces/2004/10/gram/job/types}state
globus_gram_client_job_refresh_credentials()globus_delegation_client_util_delegate_epr
globus_gram_client_register_job_refresh_credentials()globus_delegation_client_util_delegate_epr_register()
globus_gram_client_register_job_signal()ManagedExecutableJobPortType_release_epr_register()
globus_gram_client_job_signal()ManagedExecutableJobPortType_release_epr()
globus_gram_client_register_job_callback_registration()ManagedExecutableJobPortType_Subscribe_epr_register()
globus_gram_client_job_callback_register()ManagedExecutableJobPortType_Subscribe_epr()
globus_gram_client_register_job_callback_unregistration()SubscriptionManager_Destroy_epr_register()
globus_gram_client_job_callback_unregister()SubscriptionManager_Destroy_epr()
globus_gram_client_callback_disallow()globus_notification_destroy_consumer()
globus_gram_client_job_contact_free()wsa_EndpointReferenceType_destroy()
globus_gram_client_error_string()globus_error_get(result)
globus_gram_client_set_credentials()globus_soap_message_handle_set_attr() with the property name GLOBUS_SOAP_MESSAGE_USER_CREDENTIAL_KEY and the value the gss_cred_id_t
globus_gram_client_ping()XXX? Maybe factory get resource properties?
globus_gram_client_register_ping()XXX? Maybe factory get resource properties?
globus_gram_client_debug()set GLOBUS_SOAP_MESSAGE_DEBUG environment variable to MESSAGES to see XML messages sent/received
globus_gram_client_version()NO EQUIVALENT
globus_gram_client_attr_init()globus_soap_message_attr_init()
globus_gram_client_attr_destroy()globus_soap_message_attr_destroy()
globus_gram_client_attr_set_credential()globus_soap_message_handle_set_attr() with the property name GLOBUS_SOAP_MESSAGE_USER_CREDENTIAL_KEY and the value the gss_cred_id_t
globus_gram_client_attr_get_credential()globus_soap_message_attr_get() with the property name GLOBUS_SOAP_MESSAGE_USER_CREDENTIAL_KEY. Migration from Prews GRAM to ws-GRAM

Pre-WS GRAM uses a custom language for specifying a job description. WS GRAM uses an xml based language for this same purpose. In pre-WS GRAM, relations (such as count=5) can occur in any order within the RSL; in WS GRAM, the relations must be in the order in the XML schema definition. The RSL attribute description below is in the order defined by the XML schema

Table 3. RSL Migration Table

GT2 RSL AttributeGT4 job description element
(username = NAME)<localUserId>NAME<localUserId>
(two_phase = TWO_PHASE_TIMEOUT) *See Note 1*<holdState>Pending</holdState>
(executable = EXE)<executable>EXE</executable>
(directory = DIR)<directory>DIR</directory>
(arguments=ARG1 ... ARGN)<argument>ARG1</argument> ... <argument>ARGN</argument>
(environment = (ENV_VAR_1 ENV_VAL_1) ... (ENV_VAR_N ENV_VAL_N))<environment> <name>ENV_VAR_1</name> <value>ENV_VAL_1</name> ... <name>ENV_VAR_N</name> <value>ENV_VAR_N</value> </environment>
(stdin = LOCAL_FILE_PATH) *See Note 2*<stdin>file:///LOCAL_FILE_PATH</stdin>
(stdout = LOCAL_FILE_PATH) *See Note 2*<stdout>file:///LOCAL_FILE_PATH</stdout>
(stderr = LOCAL_FILE_PATH) *See Note 2*<stderr>file:///LOCAL_FILE_PATH</stderr>
(count = NUMBER)<count>NUMBER</count>
(library_path = PATH_ELEMENT_1 ... PATH_ELEMENT_N)<libraryPath>PATH_ELEMENT_1</libraryPath> ... <libraryPath>PATH_ELEMENT_N</libraryPath>
(host_count = NUMBER)<hostCount>NUMBER</hostCount>
(project = PROJECT)<project>PROJECT</project>
(queue = QUEUE)<queue>QUEUE</queue>
(max_time = MINUTES)<maxTime>MINUTES</maxTime>
(max_wall_time = MINUTES)<maxWallTime>MINUTES</maxWallTime>
(max_cpu_time = MINUTES)<maxCpuTime>MINUTES</maxCpuTime>
(max_memory = MEGABYTES)<maxMemory>MEGABYTES</maxMemory>
(min_memory = MEGABYTES)<minMemory>MEGABYTES</minMemory>
(job_type = JOBTYPE)<jobType>JOBTYPE</jobType>
(file_stage_in = (REMOTE_GRIDFTP_URL_1 LOCAL_FILE_PATH_1) ... (REMOTE_GRIDFTP_URL_N LOCAL_FILE_PATH_N)) *See Note 4*<fileStageIn> <transfer> <sourceUrl>REMOTE_GRIDFTP_URL_1</sourceUrl> <destinationUrl>file:///LOCAL_FILE_PATH_1</destinationUrl> </transfer> <transfer> <sourceUrl>REMOTE_GRIDFTP_URL_N</sourceUrl> <destinationUrl>file:///LOCAL_FILE_PATH_N</destinationUrl> </transfer> </fileStageIn>
(file_stage_out = (LOCAL_FILE_PATH_1 REMOTE_GRIDFTP_URL_1) ... (LOCAL_FILE_PATH_N REMOTE_GRIDFTP_URL_N)) *See Note 4*<fileStageOut> <transfer> <sourceUrl>file:///LOCAL_FILE_PATH_1</sourceUrl> <destinationUrl>REMOTE_GRIDFTP_URL_1</destinationUrl> </transfer> <transfer> <sourceUrl>file:///LOCAL_FILE_PATH_N</sourceUrl> <destinationUrl>REMOTE_GRIDFTP_URL_N</destinationUrl> </transfer> </fileStageOut>

Note 1: The globusrun-ws program will automatically release the hold after receiving the indicated hold state. To simulate the two-phase submit timeout, an application could set the initial termination time of the resource. A hold state may be set for fileCleanUp state for two-phase commit end, but it is not possible to submit a job with both hold states.

Note 2: stdin, stdout, and stderr must only be a local file URL. Ftp and gridftp URLs can be handled by using a fileStageIn and fileStageOut elements (described below).

Note 3: Value job types for WS GRAM are multiple (the default), single, mpi, and condor.

Note 4: The WS GRAM service uses RFT to transfer files. This only supports gridftp and ftp file transfers. The local file path must be a mappable by an entry in the file system mapping file.

The following RSL attributes have no direct equivalent in WS GRAM:

  • dry_run: Similar behavior can be obtained by using a job hold state of Pending and then destroying the job resource without releasing the hold.
  • file_stage_in_shared: No support for the GASS cache, hence this is gone. Applications may use RFT to transfer files before submitting a batch of jobs.
  • gass_cache: GASS cache is not used by WS GRAM, so there is no need for setting the cache path.
  • gram_my_job: collective operations are enabled for every managed execution job service via rendezvous registration
  • proxy_timeout: Delegated security proxies are handled via the DelegationFactory Service. Resource lifetime is controlled by the wsrl:SetTerminationTime operation
  • remote_io_url: The WS GRAM service does not use GASS, so there is no equivalent to this.
  • restart: There is no equivalent.
  • rsl_substitution: The WS GRAM service does not support user-defined substitutions. Certain values may be referenced in some RSL values by a similar technique, but these are for system configuration parameters only. See the WS GRAM job description document for description of RSL variable syntax, values, and attributes where they may be used.
  • save_state: All WS GRAM jobs are persistent, so there is no elements related to this.
  • scratch_dir: This is now a deployment configuration option.
  • stderr_position: Standard error streaming is now a feature of the globusrun-ws program instead of part of the WS GRAM service, so there is no equivalent element for restarting error streaming at a specific point.
  • stdout_position: Standard output streaming is now a feature of the globusrun-ws program instead of part of the WS GRAM service, so there is no equivalent element for restarting output streaming at a specific point.

Here are some examples of converting some pre-WS GRAM RSLs to WS GRAM.

Table 4. RSL Migration Examples

pre-WS GRAM RSLWS GRAM Job Description
            (* Simple Job Request With Arguments *)
            &(executable = /bin/echo)
             (arguments = Hello, Grid)
        
            <?xml version="1.0"?>
            <!-- Simple Job Request With Arguments -->
            <job xmlns:ns1="http://www.globus.org/namespaces/2004/10/gram/job/description">
              <ns1:executable>/bin/echo</ns1:executable>
              <ns1:argument>Hello,</ns1:argument>
              <ns1:argument>Grid</ns1:argument>
            </job>
        
            (* Multijob Request *)
            +(
              &(executable = /bin/echo)
               (arguments = Hello, Grid From Subjob 1)
               (resource_manager_name = resource-manager-1.globus.org)
               (count = 1)
             )
             (
              &(executable = mpi-hello)
               (arguments = Hello, Grid From Subjob 2)
               (resource_manager_name = resource-manager-2.globus.org)
               (count = 2)
               (jobtype = mpi)
             )
        
            <?xml version="1.0"?>
            <!-- Multijob Request -->
            <multiJob
                <!-- namespace of GRAM resource ID elements -->
                xmlns:gram="http://www.globus.org/namespaces/2004/10/gram/job"
                <!-- namespace of WS-Addressing (EPR) elements -->
                xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/03/addressing">

                <factoryEndpoint>
                  <wsa:Address>

                    <!-- URL for the factory service on resource-manager-0.globus.org --<
                    https://resource-manager-0.globus.org:8443/wsrf/services/ManagedJobFactoryService

                  </wsa:Address>
                  <wsa:ReferenceProperties>

                    <!-- ID for the "Multi" factory resource -->
                    <gram:ResourceID>Multi</gram:ResourceID>

                  </wsa:ReferenceProperties>
                  <wsa:ReferenceParameters/>
                </factoryEndpoint>

                <job>
                  <factoryEndpoint>
                    <wsa:Address>

                        <!-- URL for the factory service on resource-manager-1.globus.org -->
                        https://resource-manager-1.globus.org:8443/wsrf/services/ManagedJobFactoryService

                    </wsa:Address>
                    <wsa:ReferenceProperties>

                      <!-- ID for the "Fork" factory resource -->
                      <gram:ResourceID>Fork</gram:ResourceID>

                    </wsa:ReferenceProperties>
                    <wsa:ReferenceParameters/>
                  </factoryEndpoint>

                  <executable>/bin/echo</executable>
                  <argument>Hello,</argument>
                  <argument>Grid</argument>
                  <argument>From</argument>
                  <argument>Subjob</argument>
                  <argument>1</argument>
                  <count>1</count>
                </job>

                <job>
                  <factoryEndpoint>
                    <wsa:Address>

                        <!-- URL for the factory service on resource-manager-2.globus.org -->
                        https://resource-manager-2.globus.org:8443/wsrf/services/ManagedJobFactoryService

                    </wsa:Address>
                    <wsa:ReferenceProperties>

                      <!-- ID for the "Fork" factory resource -->
                      <gram:ResourceID>Fork</gram:ResourceID>

                    </wsa:ReferenceProperties>
                    <wsa:ReferenceParameters/>
                  </factoryEndpoint>

                  <executable>mpi-hello</executable>
                  <argument>Hello,</argument>
                  <argument>Grid</argument>
                  <argument>From</argument>
                  <argument>Subjob</argument>
                  <argument>2</argument>
                  <count>2</count>
                  <jobType>mpi</jobType>
                </job>
            </multiJob>
        

2. Migrating from GT3

Migrating to GT 4.0 from GT version 3.2:

  • The 4.0 protocol has been changed to be WSRF compliant. There is no backward compatibility between 4.0 and 3.2.

API changes since GT 3.2:

  • The MJFS create operation has become createManagedJob and, now provides the option to send a uuid. A client can use this uuid to recover a job EPR in the event that the reply message is not received. Given this new scheme, the start operation was removed. The createManagedJob() operation also allows a notification subscription request to be specified. This is the only way to reliably get all job state notifications.
  • The MJS start operation has been removed. Its purpose was to ensure that the client had received the job EPR prior to the job being executed (and thus consuming resources), and is redundant with the uuid functionality.

New GRAM Client Submission Tool:

  • globusrun-ws has replaced managed-job-globusrun as the WS GRAM client submission program. The main reason was performance. The cost of JVM startup for each job submission through managed-job-globusrun was too much. globusrun-ws is written in C and thus avoids the JVM startup cost. globusrun-ws is very similar in functionality to managed-job-globusrun, but you will need to become familiar with the arguments and options.

RSL Schema Changes Since GT 3.2:

  • RSL Substitutions RSL substitution syntax has changed to allow for a simpler RSL schema that can be parsed by standard tools. In GT 3.2, applications could define arbitrary RSL substitutions within an RSL document and rely on the GRAM service to resolve them. In GT4 WS GRAM, this feature is no longer present. In GT 4.0 there are 5 RSL variables: ${GLOBUS_USER_HOME}, ${GLOBUS_USER_NAME}, ${GLOBUS_SCRATCH_DIR}, and ${GLOBUS_LOCATION}.
  • executable is now a single local file path. Remote URLs are no longer allowed. If executable staging is desired, it should be added to the fileStageIn directive.
  • stdin is now a single local file path. Remote URLs are no longer allowed. If stdin staging is desired, it should be added to the fileStageIn directive.
  • stdout is now a single local file path, instead of a list of remote URLs. If stdout staging is desired, it should be added to the fileStageOut directive.
  • stderr is now a single local file path, instead of a list of remote URLs. If stderr staging is desired, it should be added to the fileStageOut directive.
  • scratchDirectory has been removed.
  • gramMyJobType has been removed. "Collective" functionality is always available if a job chooses to use it.
  • dryRun has been removed. This is obsolete given the addition of the holdState attribute. setting holdState to "StageIn" should prevent the job from being submitted to the local scheduler. It can then be canceled once the StageIn-Hold state notification is received.
  • remoteIoUrl has been removed. This was a hack for pre-ws GRAM involved with staging via GASS, and has no relevancy in the current implementation.
  • File Staging related RSL attributes have been replaced with RFT file transfer attributes/syntax.
  • RSL substitution definitions and substitution references have been removed in order to be able to use standard XML parsing/serialization tools.
  • RSL variables have been added. These are keywords denoted in the form of ${variable name} that can be found in certain RSL attributes.
  • Explicit credential references have been added, which, along with use of the new DelegationFactory service, replace the old implicit delegation model.

Fault changes since GT version 3.2:

  • CacheFaultType was removed since there is no longer a GASS cache.
  • RepeatedlyStartedFaultType was removed since there is no longer a start operation. Repeat creates with the same submission ID simply return the job EPR.
  • SLAFaultType was changed to ServiceLevelAgreementFaultType for clarification.
  • StreamServiceCreationFaultType was removed since there is no longer a stream service.
  • UnresolvedSubstitutionReferencesFaultType was removed since there is no longer support for substitution definitions and references in the RSL.
  • DatabaseAccessFaultType was removed since a database is no longer used to save job data.