Internal Components
Scheduler Event Generator
The Scheduler Event Generator (SEG) is a program which uses scheduler-specific monitoring modules to generate job state change events. At the SEG level, the state change events correspond to changes in any jobs which are managed by the scheduler, even if they do not correspond to jobs initiated by the Managed Job Service. These state change events are propagated to the Job State Monitor.
Depending on scheduler-specific requirements, the SEG may need to run with privileges to enable it to obtain scheduler event notifications. As such, one SEG runs per scheduler resource. For example, on a host which provides access to both PBS and fork jobs, two SEGs, running at (potentially) different privilege levels will be running.
When executed, the SEG is able to start issuing events from some time in the past. The SEG will, in general, not require any persistent state between invocations. One SEG instance exists for any particular scheduled resource instance (one for all homogeneous PBS queues, one for all fork jobs, etc).
The SEG is implemented in an executable called the globus-scheduler-event-generator, located in the Globus Toolkit's libexec directory. It is invoked with the following command line:
globus-scheduler-event-generator -sLINK TO SEG API Doc[-t ]
Fork Starter
The fork job starter's purpose is to provide a way to manage fork jobs such that job state changes will be delivered in a timely way to the WS-GRAM service. The fork job starter will be the parent of any number of processes which were started in response to MJS job requests. As the parent, it will receive the SIGCHLD signal when its child processes terminate, and use these to propagate job state changes to the WS-GRAM service.
Communication between the fork job starter and the process which invokes it is done through the standard input and output of the process. It communicates job state changes through a log file which the fork Scheduler Event Generator module parses.
Currently each job will cause a new starter to be created. The starter will remain until the job terminates.
Execution
The fork job starter takes one argument on its command-line, the path to the log file to record job state changes.
As mentioned above, it communicates with its parent process through its standard input and standard output. All communications are done using the Fork Starter protocol described below.
When the either the standard input or standard output of the fork starter are closed, then the fork starter will no longer create any new job processes. It will continue to run until all processes it started have terminated.
Log Format
For simplicity, the fork job starter's log format is based on the SEG protocol messages related to job state changes. Log messages are of the format:
001;TIMESTAMP;JOBID;STATE;EXIT_CODE
Message Type Specific Content:
- JOBID
- local scheduler-specific job id
- STATE
- new job state (integer as per the GRAM protocol constants)
- EXIT_CODE
- job exit code if STATE is done or failed.
Fork Starter Protocol
Each protocol message is contained on a single line terminated with the linefeed character. Messages are encoded in ASCII. Lines are separated into a number of fields delimited by the semicolon character. Each field may contain subfields separated by the comma character. The backslash character is used to form two-character escape sequences to protect characters which would otherwise be significant in this protocol:
- \\
- literal backslash
- \;
- literal semicolon
- \,
- literal comma
- \n
- literal linefeed
- \=
- literal equals
Each protocol message begins with a 3 digit code which indicates the type of message it is. This is always the first field in a message.
- 100 Start Job Request [input to the fork starter]
-
The 100 message format is:
100;Tag;Attribute-list
Tag is an arbitrary string containing a tag which will be included in the fork starter's response the to this message. This is in case an implementation of the fork starter allows multiple parallel job starts to happen at once.
Attribute-list is a set of job attribute fields. These fields are a subset of the RSL used to create jobs via the GRAM Managed Job Service. Each attribute in the list consists of a string (the attribute name), the equals character, and one or more attribute values separated by commas.
The fork starter understands the following attributes:
- directory
- Path string of the directory to execute the job in.
- environment
- Set of subfields containing the job environment. Each subfield consists of a VAR=VALUE pair.
- count
- Integer count of job instances to start.
- executable
- Path string of the executable to start. The executable's argv[0] will be set to this value.
- arguments
- Set of subfields containing the executable's arguments (argv[1]...argv[n]).
- stdin
- Path string to the standard input file.
- stdout
- Set of path strings to the standard output files for each process.
- stderr
- Set of path strings to the standard error files for each process.
RSL attributes such as jobtype or library_path must be handled by the code which invokes the fork starter (by updating the related job attributes--- for example, replacing the executable with mpirun and adding "-np count executable-path" to the argument list).
All files (executable, stdin, etc) must be resolved to local paths before being passed to the fork starter (staging or cache lookups must be completed before invoking the fork starter).
The 101 Message format is:
101;tag;pid[,pid]
The tag field contains the same tag string as was passed to the Fork Job Starter in a Job Start (100) message.
The pid field contains a list of subfields which correspond to the processes forked by the starter.
The 102 Message format is:
102;tag;error-code;error-message
The tag field contains the same tag string as was passed to the Fork Job Starter in a Job Start (100) message.
The error-code contains the GT2 GRAM integer error code indicating why the job start failed.
The error-message contains an error string which may be useful in diagnosing the problem with starting the job.
Development Plan
Initially for GT 4.0 The Globus::Gram::JobManager::fork module will be modified to use this in place of native perl fork and exec calls. This implies that one forker starter will be present on the system for each job started (until the job terminates). When a java interface for starting jobs is written, then the job starter will be started once per user (via sudo) per hosting environment instance.