GT 4.0 WS GRAM: User's Guide

1. Introduction

GRAM services provide secure job submission to many types of job schedulers for users who have the right to access a job hosting resource in a Grid environment. The existence of a valid proxy is in fact required for job submission. All GRAM job submission options are supported transparently through the embedded request document input. In fact, the job startup is done by submitting a client-side provided job description to the GRAM services. This submission can be made by end-users with the GRAM command-line tools.

2. New Functionality

2.1. Submission ID

i A submission ID may be used in the GRAM protocol for reliability in the face of message faults or other transient errors in order to ensure that at most one instance of a job is executed, i.e. to prevent accidental duplication of jobs under rare circumstances with client retry on failure. By default, the globusrun-ws program will generate a submission ID (uuid). One can override this behavior by supplying a submission ID as a command line argument.

If a user is unsure whether a job was submitted successfully, he should resubmit using the same ID as was used for the previous attempt.

2.2. Job hold and release

It is possible to specify in a job description that the job be put on hold when it reaches a chosen state (see GRAM Approach documentation for more information about the executable job state machine, and see the job description XML schema documentationfor information about how to specify a held state). This is useful for instance when a GRAM client wishes to directly access output files written by the job (as opposed to waiting for the stage-out step to transfer files from the job host). The client would request that the file cleanup process be held until released, giving the client an opportunity to fetch all remaining/buffered data after the job completes but before the output files are deleted.

This is used by globusrun-ws in order to ensure client-side streaming of remote files in batch mode.

2.3. MultiJobs

The new job description XML schema allows for specification of a multijob i.e. a job that is itself composed of several executable jobs. This is useful in order to bundle a group of jobs together and submit them as a whole to a remote GRAM installation.

2.4. Job and process rendezvous

WS GRAM services implement a rendezvous mechanism to perform synchronization between job processes in a multiprocess job and between subjobs in a multijob. The job application can in fact register binary information, for instance process information or subjob information, and get notified when all the other processes or subjobs have registered their own information. This is for instance useful for parallel jobs which need to rendezvous at a "barrier" before proceeding with computations, in the case when no native application API is available to help do the rendezvous.

3. Changed Functionality

3.1. Independent resource keys

Note: This change is done in GT 4.0.5

WS GRAM enables the client to add a self-generated resource key to the input type when submitting a new job request to the ManagedJobFactoryService (MJFS). This enables the client to keep in contact to the job in case the server fails after the job was created but before the EndpointReference (EPR) of the newly created job was sent to the client.The client is then able to create an EPR itself with the self-generated job UUID and the address of the ManagedExecutableJobService (MEJS) and query for the state of the job.

In former versions of WS GRAM the job UUID that was generated on the client-side was used in WS GRAM as the resource key of the created job resource. This has changed: WS GRAM now creates its own job UUID even if the client provides one in the input of its call to the MJFS and returns this job key inside the EPR which is returned to the client. With the self-generated job key the client can still contact the MJFS. The MJFS will simply use that mapping then. But the client can't contact the MEJS with that self-generated job key as part of an EPR in order to query for job state.

3.1.1. Open Questions

  • I can't see that the mappings added to the idempotenceIdMap are removed at any time. Should we add this in the remove()-method of the MEJR?

4. Usage scenarios

4.1. Generating a valid proxy

In order to generate a valid proxy file, use the grid-proxy-init tool available under $GLOBUS_LOCATION/bin:

% bin/grid-proxy-init
Your identity: /O=Grid/OU=GlobusTest/OU=simpleCA.mymachine/OU=mymachine/CN=John Doe
Enter GRID pass phrase for this identity:
Creating proxy ................................. Done
Your proxy is valid until: Tue Oct 26 01:33:42 2004

4.2. Submitting a simple job

Use the globusrun-ws program to submit a simple job without writing a job description document. Use the -c argument, a job description will be generated assuming the first arg is the executable and the remaining are arguments. For example:

   % globusrun-ws -submit -c /bin/touch touched_it
   Submitting job...Done.
   Job ID: uuid:4a92c06c-b371-11d9-9601-0002a5ad41e5
   Termination time: 04/23/2005 20:58 GMT
   Current job state: Active
   Current job state: CleanUp
   Current job state: Done
   Destroying job...Done.

Confirm that the job worked by verifying the file was touched:

   % ls -l ~/touched_it 
   -rw-r--r--  1 smartin globdev 0 Apr 22 15:59 /home/smartin/touched_it

   % date
   Fri Apr 22 15:59:20 CDT 2005

Note: you did not tell globusrun-ws where to run your job, so the default of localhost was used.

4.3. Submitting a job with the contact string

Use globusrun-ws to submit the same touch job, but this time specify the contact string.

   % globusrun-ws -submit -F https://lucky0.mcs.anl.gov:8443/wsrf/services/ManagedJobFactoryService -c /bin/touch touched_it
   Submitting job...Done.
   Job ID: uuid:3050ad64-b375-11d9-be11-0002a5ad41e5
   Termination time: 04/23/2005 21:26 GMT
   Current job state: Active
   Current job state: CleanUp
   Current job state: Done
   Destroying job...Done.

Try the same job to a remote host. Type globusrun-ws -help to learn the details about the contact string.

4.4. Submitting a job with the job description

The specification of a job to submit is to be written by the user in a job description XML file.

Here is an example of a simple job description:

<job>
    <executable>/bin/echo</executable>
    <argument>this is an example_string </argument>
    <argument>Globus was here</argument>
    <stdout>${GLOBUS_USER_HOME}/stdout</stdout>
    <stderr>${GLOBUS_USER_HOME}/stderr</stderr>
</job>

Tell globusrun-ws to read the job description from a file, using the -f argument:

% bin/globusrun-ws -submit -f test_super_simple.xml
Submitting job...Done.
Job ID: uuid:c51fe35a-4fa3-11d9-9cfc-000874404099
Termination time: 12/17/2004 20:47 GMT
Current job state: Active
Current job state: CleanUp
Current job state: Done
Destroying job...Done.

Note the usage of the substitution variable ${GLOBUS_USER_HOME} which resolves to the user home directory.

Here is an example with more job description parameters:

<?xml version="1.0" encoding="UTF-8"?>
<job>
    <executable>/bin/echo</executable>
    <directory>/tmp</directory>
    <argument>12</argument>
    <argument>abc</argument>
    <argument>34</argument>
    <argument>this is an example_string </argument>
    <argument>Globus was here</argument>
    <environment>
        <name>PI</name>
        <value>3.141</value>
    </environment>
    <stdin>/dev/null</stdin>
    <stdout>stdout</stdout>
    <stderr>stderr</stderr>
    <count>2</count>
</job>

Note that in this example, a <directory> element specifies the current directory for the execution of the command on the execution machine to be /tmp, and the standard output is specified as the relative path stdout. The output is therefore written to /tmp/stdout:

% cat /tmp/stdout
12 abc 34 this is an example_string  Globus was here

4.5. Delegating credentials

There are three different uses of delegated credentials: 1) for use by the MEJS to create a remote user proxy, 2) for use by the MEJS to contact RFT, and 3) for use by RFT to contact the GridFTP servers. The EPRs to each of these are specified in three job description elements--they are jobCredentialEndpoint, stagingCredentialEndpoint, and transferCredentialEndpoint respectively. Please see the job description schema and RFT transfer request schema documentation for more details about these elements.

The globusrun-ws client can either delegate these credentials automatically for a particular job, or it can reuse pre-delegated credentials (see next paragraph) through the use of command-line arguments for specifying the credentials' EPR files. Please see the WS GRAM command-line tools documentation for details on these command-line arguments.

It is possible to use delegation command-line clients to obtain and refresh delegated credentials in order to use them when submitting jobs to WS GRAM. This, for instance, enables the submission of many jobs using a shared set of delegated credentials. This can significantly decrease the number of remote calls for a set of jobs, thus improving performance.

4.6. Finding which schedulers are interfaced by the WS GRAM installation

Unfortunately there is no option yet to print the list of local resource managers supported by a given WS-GRAM service installation. But there is a way to check, whether WS-GRAM supports a certain local resource manager or not. The following command gives an example of how a client could check if Condor is available at the remote site:

wsrf-query \
    -s https://<hostname>:<port>/wsrf/services/ManagedJobFactoryService \
    -key {http://www.globus.org/namespaces/2004/10/gram/job}ResourceID Condor \
    "//*[local-name()='version']"
   

Replace host and port settings with the values you need. If Condor is available on the server-side, the output should look something like the following:

	<ns1:version xmlns:ns1="http://mds.globus.org/metadata/2005/02">4.0.3</ns1:version>
   

In this example the output indicates, that a GT is listening on the server-side, that Condor is available and that the GT version is 4.0.3. If no GT at all is running at the specified host and/or port or if the specified local resource manager is not available on the server-side, the output will be an error message.

On the server-side the GRAM name of local resource managers for which GRAM support has been installed can be obtained by looking at the GRAM configuration on the GRAM server-side machine, as explained here.

The GRAM name of the local resource manager can be used with the factory type option of the job submission command-line tool to specify which factory resource to use when submitting a job.

4.7. Specifying file staging in the job description

In order to do file staging one must add specific elements to the job description and delegate credentials appropriately (see Delegating credentials). The file transfer directives follow the RFT syntax, which allows only for third-party transfers. Each file transfer must therefore specify a source URL and a destination URL. URLs are specified as GridFTP URLs (for remote files) or as file URLs (for files local to the service--these are converted internally to full GridFTP URLs by the service).

For instance, in the case of staging a file in, the source URL would be a GridFTP URL (for instance gsiftp://job.submitting.host:2811/tmp/mySourceFile ) resolving to a source document accessible on the file system of the job submission machine (for instance /tmp/mySourceFile ). At run-time the Reliable File Transfer service used by the MEJS on the remote machine would reliably fetch the remote file using the GridFTP protocol and write it to the specified local file (for instance file:///${GLOBUS_USER_HOME}/my_transfered_file, which resolves to ~/my_transfered_file). Here is how the stage-in directive would look like:

    <fileStageIn>
        <transfer>
            <sourceUrl>gsiftp://job.submitting.host:2811/tmp/mySourceFile</sourceUrl>
            <destinationUrl>file:///${GLOBUS_USER_HOME}/my_transfered_file</destinationUrl>
        </transfer>
    </fileStageIn>

Note: additional RFT-defined quality of service requirements can be specified for each transfer. See the RFT documentation for more information.

Here is an example job description with file stage-in and stage-out:

<job>
    <executable>my_echo</executable>
    <directory>${GLOBUS_USER_HOME}</directory>
    <argument>Hello</argument>
    <argument>World!</argument>
    <stdout>${GLOBUS_USER_HOME}/stdout</stdout>
    <stderr>${GLOBUS_USER_HOME}/stderr</stderr>
    <fileStageIn>
        <transfer>
            <sourceUrl>gsiftp://job.submitting.host:2811/bin/echo</sourceUrl>
            <destinationUrl>file:///${GLOBUS_USER_HOME}/my_echo</destinationUrl>
        </transfer>
    </fileStageIn>
    <fileStageOut>
        <transfer>
            <sourceUrl>file:///${GLOBUS_USER_HOME}/stdout</sourceUrl>
            <destinationUrl>gsiftp://job.submitting.host:2811/tmp/stdout</destinationUrl>
        </transfer>
    </fileStageOut>
    <fileCleanUp>
        <deletion>
            <file>file:///${GLOBUS_USER_HOME}/my_echo</file>
        </deletion>
    </fileCleanUp>
</job>

Note that the job description XML does not need to include a reference to the schema that describes its syntax. As a matter of fact it is possible to omit the namespace in the GRAM job description XML elements as well. The submission of this job to the GRAM services causes the following sequence of actions:

  1. The /bin/echo executable is transfered from the submission machine to the GRAM host file system. The destination location is the HOME directory of the user on behalf of whom the job is executed by the GRAM services (see <fileStageIn>).
  2. The transfered executable is used to print a test string (see <executable>, <directory> and the <argument> elements) on the standard output, which is redirected to a local file (see <stdout>).
  3. The standard output file is transfered to the submission machine (see <fileStageOut>).
  4. The file that was initially transfered during the stage-in phase is removed from the file system of the GRAM installation (see <fileCleanup>).

4.8. Specifying and handling custom job description extensions.

[Note]Note

This feature has been added in GT 4.0.5. For versions older than 4.0.5 an update package is available to upgrade one's installation. See the downloads page for the latest links.

Basic support is provided for specifying custom extensions to the job description. There are plans to improve the usability of this feature, but at this time it involves a bit of work.

Specifying the actual custom elements in the job description is trivial. Simply add any elements that you need between the beginning and ending extensions tags at the bottom of the job description as in the following basic example:

<job>
    <executable>/home/user1/myapp</executable>
    <extensions>
        <myData>
            <var1>hello</var1>
            <var2>world</var2>
        </myData>
    </extensions>
</job>

To handle this data, you will have to alter the appropriate perl scheduler script (i.e. fork.pm for the Fork scheduler, etc...) to parse the data returned from the $description->extensions() sub.

More information about job description extension support can be found in the Admin guide

4.9. Specifying and submitting a multijob

The job description XML schema allows for specification of a multijob i.e. a job that is itself composed of several executable jobs, which we will refer to as subjobs (note: subjobs cannot be multijobs, so the structure is not recursive). This is useful for instance in order to bundle a group of jobs together and submit them as a whole to a remote GRAM installation.

Note that no relationship can be specified between the subjobs of a multijob. The subjobs are submitted to job factory services in their order of appearance in the multijob description.

Within a multijob description, each subjob description must come along with an endpoint for the factory to submit the subjob to. This enables the at-once submission of several jobs to different hosts. The factory to which the multijob is submitted acts as an intermediary tier between the client and the eventual executable job factories.

Here is an example of a multijob description:

<?xml version="1.0" encoding="UTF-8"?>
<multiJob xmlns:gram="http://www.globus.org/namespaces/2004/10/gram/job" 
     xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/03/addressing">
    <factoryEndpoint>
        <wsa:Address>
            https://localhost:8443/wsrf/services/ManagedJobFactoryService
        </wsa:Address>
        <wsa:ReferenceProperties>
            <gram:ResourceID>Multi</gram:ResourceID>
        </wsa:ReferenceProperties>
    </factoryEndpoint>
    <directory>${GLOBUS_LOCATION}</directory>
    <count>1</count>

    <job>
        <factoryEndpoint>
            <wsa:Address>https://localhost:8443/wsrf/services/ManagedJobFactoryService</wsa:Address>
            <wsa:ReferenceProperties>
                <gram:ResourceID>Fork</gram:ResourceID>
            </wsa:ReferenceProperties>
        </factoryEndpoint>
        <executable>/bin/date</executable>
        <stdout>${GLOBUS_USER_HOME}/stdout.p1</stdout>
        <stderr>${GLOBUS_USER_HOME}/stderr.p1</stderr>
        <count>2</count>
    </job>

    <job>
        <factoryEndpoint>
            <wsa:Address>https://localhost:8443/wsrf/services/ManagedJobFactoryService</wsa:Address>
            <wsa:ReferenceProperties>
                <gram:ResourceID>Fork</gram:ResourceID>
            </wsa:ReferenceProperties>
        </factoryEndpoint>
        <executable>/bin/echo</executable>
        <argument>Hello World!</argument>        
        <stdout>${GLOBUS_USER_HOME}/stdout.p2</stdout>
        <stderr>${GLOBUS_USER_HOME}/stderr.p2</stderr>
        <count>1</count>
    </job>

</multiJob>

Notes:

  • The <ResourceID> element within the <factoryEndpoint> WS-Addressing endpoint structures must be qualified with the appropriate GRAM namespace.
  • Apart from the factoryEndpoint element, all elements at the enclosing multijob level act as defaults for the subjob parameters, in this example <directory> and <count>.
  • The default <count> value is overridden in the subjob descriptions.

In order to submit a multijob description, use a job submission command-line tool and specify the Managed Job Factory resource to be Multi. For instance, submitting the multijob description above using globusrun-ws, we obtain:

% bin/globusrun-ws -submit -f test_multi.xml
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:bd9cd634-4fc0-11d9-9ee1-000874404099
Termination time: 12/18/2004 00:15 GMT
Current job state: Active
Current job state: CleanUp
Current job state: Done
Destroying job...Done.
Cleaning up any delegated credentials...Done.

A multijob resource is created by the factory and exposes a set of WSRF resource properties different than the resource properties of an executable job. The state machine of a multijob is also different since the multijob represents the overall execution of all the executable jobs it is composed of.

4.10. Specifying SoftEnv keys in the job description

Note: This feature is only available beginning from version 4.0.5 of the toolkit.

For a short introduction to SoftEnv please have a look at the SoftEnv section in the admin guide.

If SoftEnv is enabled on the server-side, nothing must be added to a job description to set up the environment which is specified in the .soft file in the remote home directory of the user before the job is submitted to the scheduler. If a different software environment should be used than the one specified in the remote .soft file, the user must provide SoftEnv parameters in the extensions element of the job description. The schema of the extension element for software selection in the job description is as follows:

<element name="softenv" type="xsd:string">

For example, to add the SoftEnv commands "@teragrid-basic", "+intel-compilers", "+atlas", and "+tgcp" to the job process's environment, the user would specify the following "extensions" element in the job description:

<extensions>
  <softenv>@teragrid-basic</softenv>
  <softenv>+intel-compilers</softenv>
  <softenv>+atlas</softenv>
  <softenv>+tgcp</softenv>
</extensions>

So far there is no way for a user to get information from the remote service itself whether SoftEnv support is enabled or not. The only way to check this so far is to submit a job with /bin/env as executable and watch the results.

Scenarios:

  1. SoftEnv is disabled on the server-side

    1. The user provides no SoftEnv extensions

      No SoftEnv environment is configured then before job submission, even if the user has a .soft file in his remote home directory.

    2. The user provides valid SoftEnv extensions

      If SoftEnv is not installed on the server then no environment will be configured

      If SoftEnv is installed, the environment the user specifies in the extensions elements overwrites any softenv configuration the user specifies in a .soft or a .nosoft file in his remote home directory. The environment will be configured as specified by the user in the extension elements before job submission.

    3. The user provides invalid SoftEnv extensions

      If SoftEnv is not installed on the server then no environment will be configured

      If SoftEnv is installed, the environment the user specifies in the extensions elements overwrites any softenv configuration the user specifies in a .soft or a .nosoft file in his remote home directory. Only the valid keys in the SoftEnv extensions elements will be configured. If no valid key is in there then no environment will be configured. SoftEnv warnings are logged to stdout of the job.

    In general jobs don't fail if they have SoftEnv extensions in their description and SoftEnv is disabled or even not installed on the service-side. But they will fail if they rely on environments being set up before job submission.

  2. SoftEnv is enabled on the server-side

    1. The user provides no SoftEnv extensions

      If the user has a .soft file (and no .nosoft file) in his remote home directory then the environment defined in his .soft file will be configured before job submission. If there is a .soft file) in his remote home directory no environment will be prepared.

    2. The user provides valid SoftEnv extensions

      The specified environment overwrites any softenv configuration the user specifies in a .soft or a .nosoft file in his remote home directory. The environment will be configured as specified by the user in the extension elements before job submission.

    3. The user provides invalid SoftEnv extensions

      The specified environment overwrites any softenv configuration the user specifies in a .soft or a .nosoft file in his remote home directory. Only the valid keys in the SoftEnv extension elements will be configured. If no valid key is in there then no environment will be configured. SoftEnv warnings are logged to stdout of the job.

Note: In the current implementation it is not possible to call executables directly whose paths are defined in SoftEnv without specifiying the complete path to the executable.

Example: Let's say a database query must be executed using the mysql-command. If mysql is not in the default path then the direct use of mysql as executable in the jobs description document will fail, even is the use of SoftEnv is configured. The mysql-command must be written to a script which is in the default path. Thus a job submission with the following job description document will fail:

<job>
  ...
  <executable>mysql</executable>
  ...
</job>

But when the command is embedded inside a shell script which is specified as the executable in the job description document, it will work.

#!/bin/sh
  ...
  mysql ...
  ...

Note: The use of invalid SoftEnv keys in the extension part of the job description document does not generate errors.

4.11. Specifying substitution variables in a job description

Job description variables are special strings in a job description that are replaced by the GRAM service with values that the client-side does not a priori know. Job description variables can be used in any path-like string or URL specified in the job description. An example of a variable is ${GLOBUS_USER_HOME}, which represents the path to the HOME directory on the file system where the job is executed. The set of variables is fixed in the gram service implementation. This is different from previous implementations of RSL substitutions in GT2 and GT3, where a user could define a new variable for use inside a job description document. This was done to preserve the simplicity of the job description XML schema (relatively to the GT3.2 RSL schema), which does not require a specialized XML parser to serialize a job description document.

Details of the RSL variables are in job description doc and in the substitution variable section of the admin guide.

4.11.1. Changes in WS GRAM beginning from GT version 4.0.5

Beginning from version 4.0.5 additional variables can be defined on the server side for use in the job description. So far the user can't get information from WS GRAM if additional variables are defined on the server-side and if, what their names and values are. This information must be published by the provider so far.

4.12. Specifying a selfgenerated resource key during job submission

WS GRAM enables a client to add a self-generated resource key to the input type when submitting a new job request to the ManagedJobFactoryService (MJFS). The client should make sure to provide a universal unique identifier (UUID) as job resource key. For information about UUID's please read here.

Providing its own UUID enables a client to resubmit a job in case the server did not respond to a prior job submission request due to e.g. network failures. If the client submits a job with an already existing resource key a second time, the job will not be started again because it's already running. This avoids unnecessary and undesired resource usage and enables a reliable job submission.

Beginning from version 4.0.5 of the toolkit WS GRAM now creates its own job UUID even if the client provides one in the input of its call to the MJFS and returns this job UUID inside the endpoint reference (EPR) to the client. The client can still contact the ManagedJobFactoryService (MJFS) with the self-generated job resource key in order to resubmit a potentially not 'lost' and submitted job. But the client can't contact the ManagedExecutableJobService (MEJS) with that self-generated job key as part of an EPR in order to query for job state any more. If it's unclear whether a job request has been started by the server, the client has to submit the job with the same job UUID again in order to get an EPR from the MJFS. With this the client then can query for job state or destroy the job.

5. Command-line tools

Please see the GT 4.0 WS GRAM Command-line Reference.

6. Graphical user interfaces

There is no support for this type of interface for WS GRAM.

7. Troubleshooting

When I submit a streaming or staging job, I get the following error: ERROR service.TransfreWork Terminal transfer error: [Caused by: Authentication failed[Caused by: Operation unauthorized(Mechanism le vel: Authorization failed. Expected"/CN=host/localhost.localdomain" target but r eceived "/O=Grid/OU=GlobusTest/OU=simpleCA-my.machine.com/CN=host/my.machine.com ")

  • Check $GLOBUS_LOCATION/etc/gram-service/globus_gram_fs_map_config.xml for the use of "localhost" or "127.0.0.1" instead of the public hostname (in the example above, "my.machine.com"). Change these uses of the loopback hostname or IP to the public hostname as neccessary.

Fork jobs work fine, but submitting PBS jobs with globusrun-ws hangs at "Current job state: Unsubmitted"

  • Make sure the the log_path in $GLOBUS_LOCATION/etc/globus-pbs.conf points to locally accessible scheduler logs that are readable by the user running the container. The Scheduler Event Generator (SEG) will not work without local scheduler logs to monitor. This can also apply to other resource managers, but is most comonly seen with PBS.
  • If the SEG configuration looks sane, try running the SEG tests. They are located in $GLOBUS_LOCATION/test/globus_scheduler_event_generator_*_test/. If Fork jobs work, you only need to run the PBS test. Run each test by going to the associated directory and run ./TESTS.pl. If any tests fail, report this to the [email protected] mailing list.
  • If the SEG tests succeed, the next step is to figure out the ID assigned by PBS to the queued job. Enable GRAM debug logging by uncommenting the appropriate line in the $GLOBUS_LOCATION/container-log4j.properties configuration file. Restart the container, run a PBS job, and search the container log for a line that contains "Received local job ID" to obtain the local job ID.
  • Once you have the local job ID you can check the latest PBS logs pointed to by the value of "log_path" in $GLOBUS_LOCATION/etc/globus-pbs.conf to make sure the job's status is being logged. If the status is not being logged, check the documentation for your flavor of PBS to see if there's any futher configuration that needs to be done to enable job status logging. For example, PBS Pro requires a sufficient -e <bitmask> option added to the pbs_server command line to enable enough logging to satisfy the SEG.
  • If the correct status is being logged, try running the SEG manually to see if it is reading the log file properly. The general form of the SEG command line is as follows: $GLOBUS_LOCATION/libexec/globus-scheduler-event-generator -s pbs -t <timestamp> The timestamp is in seconds since the epoch and dictates how far back in the log history the SEG should scan for job status events. The command should hang after dumping some status data to stdout. If no data appears, change the timestamp to an earlier time. If nothing ever appears, report this to the [email protected] mailing list.
  • If running the SEG manually succeeds, try running another job and make sure the job process actually finishes and PBS has logged the correct status before giving up and cancelling globusrun-ws. If things are still not working, report your problem and exactly what you have tried to remedy the situtation to the [email protected] mailing list.

The job manager detected an invalid script response

  • Check for a restrictive umask. When the service writes the native scheduler job description to a file, an overly restrictive umask will cause the permissions on the file to be such that the submission script run through sudo as the user cannot read the file (bug #2655).

When restarting the container, I get the following error: Error getting delegation resource

  • Most likely this is simply a case of the delegated credential expiring. Either refresh it for the affected job or destroy the job resource.

The user's home directory has not been determined correctly

  • This occurs when the administrator changed the location of the users's home directory and did not restart the GT4 container afterwards. Beginning from version 4.0.3 of the GT, WS-GRAM determines a user's home directory only once in the lifetime of a container (when the user submits the first job). Subsequently submitted jobs will use the cached home directory during job execution.

8. Usage statistics collection by the Globus Alliance

The following usage statistics are sent by default in a UDP packet (in addition to the GRAM component code, packet version, timestamp, and source IP address) at the end of each job (i.e. when Done or Failed state is entered).

  • job creation timestamp (helps determine the rate at which jobs are submitted)
  • scheduler type (Fork, PBS, LSF, Condor, etc...)
  • jobCredentialEndpoint present in RSL flag (to determine if server-side user proxies are being used)
  • fileStageIn present in RSL flag (to determine if the staging in of files is used)
  • fileStageOut present in RSL flag (to determine if the staging out of files is used)
  • fileCleanUp present in RSL flag (to determine if the cleaning up of files is used)
  • CleanUp-Hold requested flag (to determine if streaming is being used)
  • job type (Single, Multiple, MPI, or Condor)
  • gt2 error code if job failed (to determine common scheduler script errors users experience)
  • fault class name if job failed (to determine general classes of common faults users experience)

If you wish to disable this feature, please see the Java WS Core System Administrator's Guide section on Usage Statistics Configuration for instructions.

Also, please see our policy statement on the collection of usage statistics.