Internal Components

Scheduler Event Generator

The Scheduler Event Generator (SEG) is a program which uses scheduler-specific monitoring modules to generate job state change events. At the SEG level, the state change events correspond to changes in any jobs which are managed by the scheduler, even if they do not correspond to jobs initiated by the Managed Job Service. These state change events are propagated to the Job State Monitor.

Depending on scheduler-specific requirements, the SEG may need to run with privileges to enable it to obtain scheduler event notifications. As such, one SEG runs per scheduler resource. For example, on a host which provides access to both PBS and fork jobs, two SEGs, running at (potentially) different privilege levels will be running.

When executed, the SEG is able to start issuing events from some time in the past. The SEG will, in general, not require any persistent state between invocations. One SEG instance exists for any particular scheduled resource instance (one for all homogeneous PBS queues, one for all fork jobs, etc).

The SEG is implemented in an executable called the globus-scheduler-event-generator, located in the Globus Toolkit's libexec directory. It is invoked with the following command line:

    globus-scheduler-event-generator -s  [-t ]
LINK TO SEG API Doc

Fork Starter

The fork job starter's purpose is to provide a way to manage fork jobs such that job state changes will be delivered in a timely way to the WS-GRAM service. The fork job starter will be the parent of any number of processes which were started in response to MJS job requests. As the parent, it will receive the SIGCHLD signal when its child processes terminate, and use these to propagate job state changes to the WS-GRAM service.

Communication between the fork job starter and the process which invokes it is done through the standard input and output of the process. It communicates job state changes through a log file which the fork Scheduler Event Generator module parses.

Currently each job will cause a new starter to be created. The starter will remain until the job terminates.

Execution

The fork job starter takes one argument on its command-line, the path to the log file to record job state changes.

As mentioned above, it communicates with its parent process through its standard input and standard output. All communications are done using the Fork Starter protocol described below.

When the either the standard input or standard output of the fork starter are closed, then the fork starter will no longer create any new job processes. It will continue to run until all processes it started have terminated.

Log Format

For simplicity, the fork job starter's log format is based on the SEG protocol messages related to job state changes. Log messages are of the format:

        001;TIMESTAMP;JOBID;STATE;EXIT_CODE

Message Type Specific Content:

JOBID
local scheduler-specific job id
STATE
new job state (integer as per the GRAM protocol constants)
EXIT_CODE
job exit code if STATE is done or failed.

Fork Starter Protocol

Each protocol message is contained on a single line terminated with the linefeed character. Messages are encoded in ASCII. Lines are separated into a number of fields delimited by the semicolon character. Each field may contain subfields separated by the comma character. The backslash character is used to form two-character escape sequences to protect characters which would otherwise be significant in this protocol:

\\
literal backslash
\;
literal semicolon
\,
literal comma
\n
literal linefeed
\=
literal equals

Each protocol message begins with a 3 digit code which indicates the type of message it is. This is always the first field in a message.

100 Start Job Request [input to the fork starter]

The 100 message format is:

    100;Tag;Attribute-list

Tag is an arbitrary string containing a tag which will be included in the fork starter's response the to this message. This is in case an implementation of the fork starter allows multiple parallel job starts to happen at once.

Attribute-list is a set of job attribute fields. These fields are a subset of the RSL used to create jobs via the GRAM Managed Job Service. Each attribute in the list consists of a string (the attribute name), the equals character, and one or more attribute values separated by commas.

The fork starter understands the following attributes:

directory
Path string of the directory to execute the job in.
environment
Set of subfields containing the job environment. Each subfield consists of a VAR=VALUE pair.
count
Integer count of job instances to start.
executable
Path string of the executable to start. The executable's argv[0] will be set to this value.
arguments
Set of subfields containing the executable's arguments (argv[1]...argv[n]).
stdin
Path string to the standard input file.
stdout
Set of path strings to the standard output files for each process.
stderr
Set of path strings to the standard error files for each process.

RSL attributes such as jobtype or library_path must be handled by the code which invokes the fork starter (by updating the related job attributes--- for example, replacing the executable with mpirun and adding "-np count executable-path" to the argument list).

All files (executable, stdin, etc) must be resolved to local paths before being passed to the fork starter (staging or cache lookups must be completed before invoking the fork starter).

101 Successful Job Start [output from the fork starter]

The 101 Message format is:

    101;tag;pid[,pid]

The tag field contains the same tag string as was passed to the Fork Job Starter in a Job Start (100) message.

The pid field contains a list of subfields which correspond to the processes forked by the starter.

102 Unsuccessful start [output from the fork starter]

The 102 Message format is:

    102;tag;error-code;error-message

The tag field contains the same tag string as was passed to the Fork Job Starter in a Job Start (100) message.

The error-code contains the GT2 GRAM integer error code indicating why the job start failed.

The error-message contains an error string which may be useful in diagnosing the problem with starting the job.

Development Plan

Initially for GT 4.0 The Globus::Gram::JobManager::fork module will be modified to use this in place of native perl fork and exec calls. This implies that one forker starter will be present on the system for each job started (until the job terminates). When a java interface for starting jobs is written, then the job starter will be started once per user (via sudo) per hosting environment instance.