Name
managed-job-globusrun — (DEPRECATED) Java-based job submission client for GRAM
Synopsis
managed-job-globusrun
Tool Description
![]() | Warning |
---|---|
This tool has been deprecated in this version and is only documented here because it may be useful for some testing purposes only. Use globusrun-ws instead. |
managed-job-globusrun
is a Java-based job
submission tool for the [WS] GRAM (i.e. it is a program for submitting jobs to a
local or remote host and managing those jobs via the GRAM services).
GRAM services provide secure job submission to many types of job schedulers
for users who have the right to access a job hosting resource in a Grid environment.
All GRAM job submission options are supported transparently through the
embedded request document input. In fact, the job startup is done by submitting
a client-side provided job description (RSL document in GT2).
to the GRAM services.
In addition to starting jobs, it is possible to delegate credentials needed for certain optional GRAM features, query the state of a previously started job and parse a job description file without making any submission. Online and batch submission modes are supported with reattachment (recovery) for jobs whether they were started with this client or another GRAM client application.
Note: the existence of a valid proxy is required for essentially all supported operations but job description file parsing (-p). In order to generate a valid proxy file, use the grid-proxy-init tool available under $GLOBUS_LOCATION/bin.
Command Syntax
Arguments
managed-job-globusrun [options] [<factory>] <job description> managed-job-globusrun -p -file <job description filename> managed-job-globusrun (-state | -release | -kill) <job handle> managed-job-globusrun -help | -usage | -version
with
<job description> = -file <job description filename> | <command line> <factory> = -factory <contact> [-type <type>] <contact> = [<protocol>://]<host>[:[port]][/<service>] [options] = [-q] [-n] [-b] [-duration] [-terminate-at] [-auth <auth>] [-xmlsec <sec>] [-personal] [-submission-id <ID>]
Options
Table 14. Options for managed-job-globusrun
Help options | |
-help | Displays help information about the command. |
-usage | Displays usage of the command. |
-v, -version | Displays version of the command. |
Job Factory Contact options | |
-factory <contact> | Specifies the URL of the Job Factory Service to contact when submitting or listing jobs. A factory contact string can be specified in the following ways:
It is also possible to specify the protocol by prepending protocol:// to each of the previous possibilities, bringing the total number of supported syntaxes to 12. For those factory contacts which omit the protocol, port or service field, the following default values are used, as the following table explains:
Omitting altogether the -factory option is equivalent to specifying the local host as the contact string (with the implied default protocol, port and service). |
-type <factory type> | Specifies the type of factory resource to use. This is the name of the local resource manager. The default value is Fork. |
Job Specification options | |
<command line> | Creates a simple job description that only consists of a command line of the form: 'executable (argument)*' Quotes must be used if there is one or more arguments. |
-file <job description filename> | Reads job description from the local file <job description filename>. The job description must be a single job request. |
-p | This option only parses the job description, and then prints either a success message or a parser failure. No job will be submitted to any factory service. The job description must be a single job request. |
Batch Operations options | |
-b, -batch | Do not wait for started job to complete (and do not destroy started job service on exit.) The handle of the job service will be printed on the standard output. This option is incompatible with multi-request jobs. Implies -quiet. |
-state <handle> | Print out the state of the specified job. For a list of valid states, see the GRAM documentation [need link]; the current valid states are Pending, Active, Done, Suspended, and Failed. The handle may need to be quoted. |
-r, -release <handle> | release the specified job from hold. The handle may need to be quoted. |
-k, -kill <handle> | Kill the specified job. The handle may need to be quoted. Note: The <handle> argument is printed out when executing in batch mode or when using the -list option. |
Job Resource Lifetime options | |
-duration <duration> | Specify the duration of the job resource. The job resource will destroy itself automatically after the specified duration starting from service creation.
Incompatible with -terminate-at. Useful with -batch. |
-terminate-at <date> | Specify the termination date/time of the job resource. Same as -duration but with an absolute date/time value.
The date expression may need to be quoted, as in: -terminate-at '08/15/2005 11:30' Incompatible with -duration. Useful with -batch. |
Security options | |
-auth <auth> | Set authorization type. Usually, secure communication includes mutual authentication. In addition to the service authorizing the client for the requested operation(s), an authorization decision is made by the client to determine whether the remote service is the one intended. Depending on the configured authorization type of the GRAM services (which by default is 'host'), the user must select a corresponding client-side authorization type <auth>. <auth> can be:
|
-xmlsec <sec> | Set message protection level. <sec> can be:
|
-personal | Shortcut for -auth self. |
-proxy <proxy file> | Use <proxy file> instead of the default proxy credential file. |
-deleg <deleg> | Set delegation type. <deleg> can be:
|
Miscellaneous options | |
-q, -quiet | Switch quiet mode on, i.e. do not print diagnostic messages when job state changes, in non-batch mode. Disabled by default. |
-n, -no-interrupt | Disable interrupt handling. By default, interrupt signals (typically generated by Ctrl + C) cause the program to terminate the currently submitted job. This flag disables that behavior. |
-timeout <integer> | Set timeout for HTTP socket, in milliseconds. Applies to job submission only. The default value is 120000. |
-submission-id <ID> | Set the submission ID of a previous job submission for which no server response was received. The ID can be used after an attempted job submission in order to recover the handle to the job. |
GT2 globusrun options NOT functional (yet) | |
-l, -list | NOT IMPLEMENTED ON SERVER SIDE YET. List previously started and not destroyed job services for this user. The output of this command consists of the handles and job description of the submitted jobs. Requires the -factory <URL> argument. |
-dryrun | NOT IMPLEMENTED ON SERVER SIDE YET. Augment the job description in order to mark this job as a dry run, if the job description does not already say so. This causes the job manager to stop short of starting the job, but still detect other job description errors (such as bad directory, bad executable, etc). An error message will be displayed if the dry run fails. Otherwise, a message will be displayed indicating that the dryrun was successful. |
-authenticate-only | NOT IMPLEMENTED ON SERVER SIDE YET. |
New Functionality
Substitution variables
In GT 3.9.2, job description substitution variables had been removed from GRAM. Starting with GT 3.9.5, substitution variables are available again, while preserving the simplicity of the job description XML schema (relative to the GT3.2 job description schema). Substitution variables can be used in any path-like string or URL specified in the job description. They are special strings that are replaced by the GRAM services with actual values that the client-side does not a priori know. An example of substitution variable is ${GLOBUS_USER_HOME}, which represents the path to the HOME directory on the file system visible by the GRAM services of the user on behalf of whom the job is executed.
Details are in job description doc
Submission ID
A submission ID may be used in the GRAM protocol for robust reliability in the face of message faults or other transient errors in order to ensure that at most one instance of a job is executed, i.e. to prevent accidental duplication of jobs under rare circumstances with client retry on failure. The managed-job-globusrun tool always uses this feature, requiring either a submission ID to be passed in as input or a new unique ID to be created by the tool itself. If a new ID is created, it should be captured by the user who wishes to exploit this reliability interface. The ID in use, whether created or passed as input, will be written to the first line of standard output unless the quiet mode is in effect.
If a user is unsure whether a job was submitted successfully, he should resubmit using the same ID as was used for the previous attempt.
Job hold and release
It is possible to specify in a job description that the job be put on hold when it reaches a chosen state (see GRAM Approach documentation for more information about the executable job state machine, and the job description XML schema documentation for information about how to specify a held state This is useful for instance when a GRAM client wishes to directly access output files written by the job (as opposed to waiting for the stage-out step to transfer files from the job host). The client would request that the file cleanup process be held until released, giving the client an opportunity to fetch all remaining/buffered data after the job completes but before the output files are deleted.
Note that the hold feature of the GRAM service interface is not exploited by the current Java version of the client tool, but will be in the C client in order to implement client-side streaming of remote stdout/err.
The current client tool does however
- automatically release a job remotely in interactive mode if the job is being held at any given state
- offer an option (-release) for the user to release a job previously submitted in batch mode.
MultiJobs
The new job description XML schema allows for specification of a MultiJob i.e. a job that is itself composed of several executable jobs (those jobs cannot be multijobs, so the structure is not recursive). This is useful in order to bundle a group of jobs together and submit them as a whole to a remote GRAM installation.
Note that there is no specification of relationships between the executable jobs, which we will refer to as "subjobs". The subjobs are submitted to job factory services in their order of appearance in the multijob description.
Job and process rendezvous
This version of GRAM offers a mechanism to perform synchronization between job processes in a multiprocess job and between subjobs in a multijob. The job application can in fact register binary information, for instance process information or subjob information, and get notified when all the other processes or subjobs have registered their own information. This is for instance useful for parallel jobs which need to rendezvous at a "barrier" before proceeding with computations, in the case when no native application API is available to help do the rendezvous.
Limitations
With the porting of existing GRAM functionality from OGSI to WSRF, this new version of the job submission tool suffers from a few limitations comparatively to previous versions of the tool. These limitations will be dealt with in the next version of the tool, which will be implemented in C and thus will be better performing.
No more file staging using GASS
The GASS server is not being used anymore by GRAM, so the options -server and -write have been removed. Instead, file staging is done in a reliable fashion via RFT and GridFTP servers. file staging in GT 4.0 GRAM
No standard output redirection yet
Unlike the GT3.2 managed-job-globusrun used with the option -output, this version of the tool does not offer any streamed redirection of the standard streams. This is because the GASS server is not used anymore by GRAM. Instead, a future version of the tool will allow for streaming of any server-side file (including the standard streams of the job execution) using GridFTP "tailing" of remote files.
Tool behavior for some features
Tool-triggered automatic job resource destruction
Execution errors and user interrupt events are handled by automatically destroying the requested job service(s), unless the -batch option is on the command-line. The -batch option prevents the tool from listening to job state changes and from waiting for the job to finish. If -batch is selected, the command will return as soon as the remote job has been submitted.
The behavior of the tool with respect to job service destruction will vary in response to several kinds of events:
- The command exits normally after the job(s) finish(es), and destroys the job service(s) it requested. In batch mode, the requested job is never destroyed.
- The command is terminated in response to a user interrupt, such as typing Ctrl + C, or a system-wide event, such as user logoff or system shutdown. If the -no-interrupt option is on the command-line, and the command-line has been successfully parsed when the interrupt occurs, the tool does not destroy any job service(s) it requested. Otherwise the tool destroys the requested job service(s).
- In case of any error of execution, the command will exit and destroy the job(s) it successfully requested.
If the Java virtual machine of the tool aborts, that is, stops running without shutting down cleanly, for instance because it received a SIGKILL signal on Unix, then no guarantee can be made about whether or not the job service(s) will be destroyed.
Note: the shutdown behavior explained above cannot be guaranteed if the JVM option -Xrs is entered. The recommended way to disable service destruction is to specify the -batch option on the command-line.
Credential delegation
Single job submission
managed-job-globusrun inserts references to newly delegated credentials in the job description before submitting it. In order to do so, it obtains endpoint references to resources representing delegated credentials by passing a proxy credential (user supplied or default) to the Globus delegation services. The resulting EPRs are then inserted in the job description before submission. The possible elements where the EPR are added are: as the value of jobCredentialEndpoint and stagingCredentialEndpoint, in order to secure calls to the GRAM and RFT factories, and inside each individual RFT directive, i.e. inside the fileStageIn, fileStageOut and fileCleanUp elements. See the job description doc for details about these attributes. The Managed Executable Job uses the endpoints in the job description to fetch the credentials from the Delegation services and use them as needed on behalf of the job.
MultiJob submission
managed-job-globusrun delegates full credentials to the delegation service for the multijob, then processes each single job as stated in the single job submission case.
If several subjobs are to use the same delegation service, then only one credential will be delegated to that delegation service, i.e. the same credential will be used for several jobs.
How to do common job submission tasks
Submitting a job in interactive mode
A very simple command-line can be used to submit a job. For instance, the following command-line submits a job to the GRAM services hosted on the same machine (assuming a Globus container is running of course):
% bin/managed-job-globusrun "/bin/echo Testing 1...2...3"
The output should look like:
Submission ID: uuid:661AA7F0-2573-11D9-99B2-D4755757F903 WAITING FOR JOB TO FINISH ========== State Notification ========== Job State: Active ======================================== ========== State Notification ========== Job State: CleanUp ======================================== ========== State Notification ========== Job State: Done ======================================== Exit Code: 0 DESTROYING SERVICE SERVICE DESTROYED
Note: the job state notifications are printed in the order of arrival, but they may arrive at the client-side in any order.
In this example the job description specifies the standard output stream path of the job to be: ${GLOBUS_USER_HOME}/stdout. The GRAM services replace the substitution variable ${GLOBUS_USER_HOME} with the path to the Home directory of the submitting user as seen by the machine were the invoked GRAM services are hosted. You can thus verify the output of the job with the following command:
% cat ~/stdout
which will display the string:
12 abc 34 pdscaex_instr_GrADS_grads23_28919.cfg pgwynnel was here
Submitting a job in batch mode, checking its status and destroying the resource
To submit a job without having the client wait for job completion, specify the option -batch (or -b) on the command-line:
% bin/managed-job-globusrun -batch "/bin/echo Testing 1...2...3" Warning: Will not wait for job completion, and will not destroy job service. Submission ID: uuid:9C715240-26C7-11D9-850A-ABE2020F9ED6 CREATED MANAGED JOB SERVICE WITH HANDLE: http://127.0.0.1:8080/wsrf/services/ManagedExecutableJobService?9C715240-26C7-11D9-850A-ABE2020F9ED6
To check the status of the job, use the -state option:
% bin/managed-job-globusrun -state 'http://127.0.0.1:8080/wsrf/services/ManagedExecutableJobService?9C715240-26C7-11D9-850A-ABE2020F9ED6' Job State: Done
To destroy the job resource created on the server side, use the -kill option:
% bin/managed-job-globusrun -kill 'http://127.0.0.1:8080/wsrf/services/ManagedExecutableJobService?9C715240-26C7-11D9-850A-ABE2020F9ED6' DESTROYING SERVICE SERVICE DESTROYED
Finding which schedulers are interfaced by the WS GRAM installation
Unfortunately there is no option yet to print the list of local resource managers supported by a given WS-GRAM service installation. But there is a way to check, whether WS-GRAM supports a certain local resource manager or not. The following command gives an example of how a client could check if Condor is available at the remote site:
wsrf-query \ -s https://<hostname>:<port>/wsrf/services/ManagedJobFactoryService -key {http://www.globus.org/namespaces/2004/10/gram/job}ResourceID Condor \ "//*[local-name()='version']"
Replace host and port settings with the values you need. If Condor is available on the server-side, the output should look something like the following:
<ns1:version xmlns:ns1="http://mds.globus.org/metadata/2005/02">4.0.3</ns1:version>
In this example the output indicates, that a GT is listening on the server-side, that Condor is available and that the GT version is 4.0.3. If no GT at all is running at the specified host and/or port or if the specified local resource manager is not available on the server-side, the output will be an error message.
On the server-side the GRAM nameof local resource managers for which GRAM support has
been installed can be obtained by looking at the GRAM configuration on the GRAM, as explained
here
The GRAM name of the local resource manager can be used with the -type
option to specify which factory resource to use when submitting a job.
For instance:
% bin/managed-job-globusrun -type Fork /bin/true
will submit a /bin/true
job to the Fork local resource manager
(i.e. the command-line /bin/true
will simply be executed as a newly spawn process)
% bin/managed-job-globusrun -type LSF /bin/true
will submit a /bin/true
job to the LSF scheduler (if installed).
% bin/managed-job-globusrun -type Multi -file simple_multi_job.xml
where simple_multi_job.xml contains the description of a multijob will submit a multi job to the Multi ManagedJobFactory resource.
Specifying file staging in the job description
In order to do file staging one must add specific elements to the job description. The file transfer directives follow the RFT syntax RFT syntax ,which enables third-party transfers. Each file transfer must therefore specify a source URL and a destination URL. URLs are specified as GridFTP URLs (for remote files) or as file URLs (for local files).
For instance, in the case of staging a file in, the source URL would be a GridFTP URL (for instance gsiftp://job.submitting.host:2888/tmp/mySourceFile) resolving to a source document accessible on the file system of the job submission machine (for instance /tmp/mySourceFile). At run-time the Reliable File Transfer service used by the GRAM service on the remote machine would fetch the remote file using the GridFTP protocol and write it reliably to the specified local file (for instance file:///${GLOBUS_USER_HOME}/my_transfered_file, which resolves to ~/my_transfered_file). Here is how the stage-in directive would look like:
<fileStageIn> <transfer> <sourceUrl>gsiftp://job.submitting.host:2888/tmp/mySourceFile</sourceUrl> <destinationUrl>file:///${GLOBUS_USER_HOME}/my_transfered_file</destinationUrl> </transfer> </fileStageIn>
Note: additional RFT-defined quality of service requirements can be specified for each transfer. See the RFT documentation for more information.
Here is an example job description with file stage-in and stage-out:
<job> <executable>my_echo</executable> <directory>${GLOBUS_USER_HOME}</directory> <argument>Hello</argument> <argument>World!</argument> <stdout>${GLOBUS_USER_HOME}/stdout</stdout> <stderr>${GLOBUS_USER_HOME}/stderr</stderr> <fileStageIn> <transfer> <sourceUrl>gsiftp://job.submitting.host:2888/bin/echo</sourceUrl> <destinationUrl>file:///${GLOBUS_USER_HOME}/my_echo</destinationUrl> </transfer> </fileStageIn> <fileStageOut> <transfer> <sourceUrl>file://${GLOBUS_USER_HOME}/stdout</sourceUrl> <destinationUrl>gsiftp://job.submitting.host:2888/tmp/stdout</destinationUrl> </transfer> </fileStageOut> <fileCleanUp> <deletion> <file>file://${GLOBUS_USER_HOME}/my_echo</file> </deletion> </fileCleanUp> </job>
The submission of this job to the GRAM services causes the following sequence of actions:
- The /bin/echo executable is transfered from the submission machine to the GRAM host file system. The destination location is the HOME directory of the user on behalf of whom the job is executed by the GRAM services (see <fileStageIn>).
- The transfered executable is used to print a test string (see <executable>, <directory> and the <argument> elements) on the standard output, which is redirected to a local file (see <stdout>).
- The standard output file is transfered to the submission machine (see <fileStageOut>).
- The file that was initially transfered during the stage-in phase is removed from the file system of the GRAM installation (see <fileCleanup>).
Specifying and submitting a MultiJob
Within the multijob description, each subjob description must come along with an endpoint for the factory to submit the subjob to. This enables the at-once submission of several jobs to different hosts. The factory to which the multijob is submitted acts as an intermediary tier between the client and the eventual executable job factories. See the job description schema documentation for more information about multijob specification.
A multijob must be submitted to a Multi job factory resource:
% bin/managed-job-globusrun -type Multi -file myMultiJob.xml
A multijob resource is created by the factory and exposes a set of WSRF resource properties different than the resource properties of an executable job. The state machine of a multijob is also different since the multijob represents the overall execution of all the executable jobs it is composed of.
Troubleshooting
Common issues
Expired credentials
Symptom: the client output shows an error related to expired credentials, as in:
Error: error submitting job request: ; nested exception is: javax.xml.rpc.soap.SOAPFaultException: Expired credentials (O=Grid,OU=GlobusTest,OU=simpleCA.foo.bar.com,OU=bar.com,CN=John Doe,CN=1255793213).
Solution: use the $GLOBUS_LOCATION/bin/grid-proxy-init tool to create a new proxy file:
% bin/grid-proxy-init Your identity: /O=Grid/OU=GlobusTest/OU=simpleCA-foo.bar.com/OU=bar.com/CN=John Doe Enter GRID pass phrase for this identity: Creating proxy ................................. Done Your proxy is valid until: Tue Oct 26 01:33:42 2004
Socket timeout error
Symptom: the client output shows a timeout error when waiting for the response from the GRAM service(s):
Error: error submitting job request: ; nested exception is: java.net.SocketTimeoutException: Read timed out
Solution: re-submit the job with a higher delay before HTTP socket timeout than the default.
Use the-timeout
option of managed-job-globusrun
, as in:
% bin/managed-job-globusrun -timeout 240000 -f myJob.xml
Connection refused to postmaster
Symptom: the server log and client output show exception stack traces with the following message:
Unable to create RFT Resource; nested exception is: org.apache.commons.dbcp.DbcpException: Connection refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
This error indicates a lack of configuration for RFT. Solution: See RFT Configuration Docs
Lack of authorization for the user's Distinguished Name
Symptom: the server log and client output show exception stack traces with the following message:
Error: error submitting job request: org.globus.wsrf.impl.security.authorization.exceptions.AuthorizationException: (pdp08) "/O=Grid/OU=GlobusTest/OU=simpleCA-foo.bar.com/OU=foo.bar.com/CN=John Doe" is not authorized to use operation: {http://properties.impl.wsrf.globus.org}getMultipleResourceProperties on this service
This error indicates a lack of authorization for the Distinguished Name (DN) reported in the error message. This means that according to the gridmap configuration for the toolkit, this user has not been authorized to call the operation reported in the error message.
Solution: Add an entry for the user's DN to the gridmap file. See the GRAM configuration documentation
File(s) Not Found warnings
Symptom: the server LOG displays messages at WARN severity such as:
[Thread-3] WARN factory.ManagedJobFactoryResource [getRestartTimestamp:187] java.io.FileNotFoundException: /software/globus/gt4/rc4.0.0/var/globus-jsm-fork.stamp (No such file or directory) [Thread-3] WARN factory.ManagedJobFactoryResource [getRestartTimestamp:187] java.io.FileNotFoundException: /software/globus/gt4/rc4.0.0/var/globus-jsm-multi.stamp (No such file or directory) [Thread-2] WARN utils.XmlPersistenceHelper [load:185] [CORE] File /nfs/v5/alain/.globus/persisted/128.9.72.67/ManagedExecutableJobResourceStateType/897BC6E0-26CA-11D9-8D59-FF280F77E689.xml for resource {http://www.globus.org/namespaces/2004/10/gram/job}ResourceID=897BC6E0-26CA-11D9-8D59-FF280F77E689 was not found
Solution: the log messages above are harmless and are not indicative of any problem in the behavior of the GRAM service. They can be ignored.
Known problems
Client Hanging Forever
Symptom: in interactive (i.e. non-batch) mode, the managed-job-globusrun client seems to be stuck waiting for additional job state notifications.
Solution: This is a known problem which can happen sometimes.
Possible solution: remove the timestamp files in $GLOBUS_LOCATION/var:
% rm var/globus-jsm-*.stamp
Restart the container.
If you decide to report the issue, please provide the job description and submission command-line as well as a full server-side GRAM log so we can determine the cause of the problem:
- Edit $GLOBUS_LOCATION/log4j.properties to add exec=DEBUG.
- Restart container and execute the same job submission command-line.
- Submit full GRAM server LOG to support list.
NotRegisteredException ERROR log message
Symptom: the following message appears in the server log:
[Thread-7] ERROR jobmanager.JobManager [unsubscribeForNotifications:1762] unable to stop monitoring job for state changes org.globus.exec.monitoring.NotRegisteredException at org.globus.exec.monitoring.JobStateMonitor.unregisterJobID(JobStateMonitor.java:375) at org.globus.exec.service.job.jobmanager.JobManager.unsubscribeForNotifications(JobManager.java:1758) at org.globus.exec.service.job.jobmanager.JobManager.processState(JobManager.java:1274) at org.globus.exec.service.job.jobmanager.RunQueue.run(RunQueue.java:75)
Solution: this is typically harmless and can be ignored.