WS-GRAM Scheduler Interface Tutorial (SEG Module)
The Scheduler Event Generator (or SEG) is an application which uses scheduler-specific monitors to report job state changes to the Managed Job Service. The SEG modules themselves implement a simple interface (described here) to generate events from a scheduler. This document describes how to write a SEG module for a scheduler, using the LSF scheduler implementation as an example.
Scheduler Event Generator Module
The SEG module for a scheduler may use whatever method is most appropriate for receiving events from the scheduler. At the time of writing this, all SEG modules have relied on scheduler log files to determine when to issue events. The main reasons for this are:
- Restartability
- In the case the Web Service Container fails and restarts, the SEG module must be able to resume generating events from the last state the container had persisted.
- Performance
- Earlier job monitoring mechanisms used by GRAM relied on actively polling the scheduler. This caused the head node of the scheduler to be overloaded and perform poorly. By watching log files, the computation required to obtain job state is reduced dramatically.
Even with these considerations, it might be appropriate in a SEG module for some particular scheduler to use some scheduler API to receive events directly from the scheduler if it provides a suitable API.
Module descriptor
The SEG module for a scheduler is implemented as a C shared library
which contains a globus_module_descriptor
. The module
descriptor symbol must be named
globus_scheduler_event_module_ptr
.In the lsf SEG module, we
have
#include "globus_common.h" #include "globus_scheduler_event_generator.h" #include "version.h" ... globus_module_descriptor_t globus_scheduler_event_module_ptr = { "globus_scheduler_event_generator_lsf", globus_l_lsf_module_activate, globus_l_lsf_module_deactivate, NULL, NULL, &local_version, NULL };
The module's activate (globus_l_lsf_module_activate
) and
deactivate (globus_l_lsf_module_deactivate
) functions are
the only functions which the
globus-scheduler-event-generator
program will call in the
scheduler-specific module.
Activation and Deactivation
Within the module activation function, we'll initialize Globus modules
that we will use, initialize our state structure and register an event
with the Globus event driver to enable us to begin searching the log
file for job events. The call to
globus_scheduler_event_generator_get_timestamp
will
return in its parameter the timestamp from which the scheduler module
should begin issuing events from. If this value is 0, then the monitor
should begin issuing events from the current time and ignore any previous
scheduler state changes.
At module deactivation, the LSF SEG module waits until all log searching events have finished and then deactivates the modules it activated.
LSF State
Let's look a bit at the logfile state structure. This structure contains the information we need to reliably parse the LSF log files. Most of the logic in the LSF SEG module is related to keeping track of which log file is currently being read and where in that file we can parse events from (somewhat complicated by the fact that the LSF log files are renamed from time to time). We have to be careful to make sure that when we parse an event from the logfile that the logfile hasn't been written to during our read and that the event occurs after the starting timestamp we are interested.
We also keep track of the path to the LSF log directory, the current log file we are parsing, our location within the logfile, and a buffer of partial events which we've read. All of this information is threaded throw the callbacks
Read Callback
The read callback is called periodically to check to see if any new
events are available from the log file. To make sure we are keeping
things consistent, we will stat
the LSF index file which is
changed whenever logs are rotated before we issue events for any log
messages we read. After we have a buffer of valid data, we parse it and
call functions in the SEG API to generate events from the data we read.
If we our read was up to the end of the active log file, we delay our next read for a few seconds before the read callback will be called again---the file is currently lacking new events. Otherwise, we schedule a new read callback to occur as soon as possible. We do not loop around the reads so that the SEG events we register with the API can occur.
Getting Logfile Path
For LSF, the rotated log files are named lsb.events.#
with
the number ranging from 1 to some system-configuration dependent number.
The smallest numbered log file is the most recent rotated one. The
current log file is lsb.events
. Normally,
only the current log file is used, but if that file is rotated
while we are reading from it, or if the restart timestamp is before the
beginning of the current file, then we must search the log files for one
which begins after the event timestamp we're interested in.
Other schedulers have different methods of handling these log files (PBS has one log file per day, our Fork implementation does not yet implement any log rotation).
Parsing and Generating events
The event parsing function is entirely scheduler dependent. The code scans the lsb event lines from the log file and generates appropriate calls to the Globus Scheduler Event Generator API to send messages to the Job State Monitor. The most often used API calls are those which map to GRAM job state changes:
- globus_scheduler_event_pending
- globus_scheduler_event_active
- globus_scheduler_event_done
- globus_scheduler_event_failed
Related Documentation