GT 4.0 WS Rendezvous

Introduction

Design and Interactions

Rendezvous port type

Flow of control

Given a set of tasks (e.g. processes) at the level 0 (leaf) of a hierarchy of execution units (e.g. jobs or multijobs) layered in a number of levels mapped to a corresponding hierarchy of rendezvous resources, tasks will rendezvous with each other using the rendezvous hierarchy.

For each level 1 execution unit (e.g job) or rendezvous:

  1. the controlling task (e.g. process 0) subscribes for notifications for when the rendezvous is complete (optimization: this step is not needed if there is only one task)
  2. each task (e.g. process) registers its data to the rendezvous resource of level 1 (e.g. GRAM job)
  3. when/if the controlling task (e.g. process 0) gets a notification of rendezvous completion:

    if there exists a higher-level rendezvous (e.g. multijob) then the controlling task (e.g. process 0) makes remote calls to it in order to:

    1. subscribe for notifications for when that rendezvous is complete
    2. register the data for the entire set of tasks
  4. the flow can iterate if there are more rendezvous levels

Data format

The input format of the binary data to register is recursive:
  • level 0 data :== byteCount SPACE bytes with byteCount an ASCII encoded byte array
  • level 1 data :== dataCount SPACE (level 0 data)*
  • level 2 data :== dataCount SPACE (level 1 data)*
  • ...
The "level 0 data" is the data of a task at level 0 of the rendezvous hierarchy (e.g. the data for a subjob process).

The generic form is:

  • level 0 data :== byteCount SPACE bytes
  • level n data :== dataCount SPACE (level n-1 data)* with n >=1
The level n data with n >=1 is the aggregation of all the registered data sets of level n-1 (for instance in a GRAM job, level 1 is the level of the job and its data is the aggregation, according to the format defined above, of all the data sets for every processes started by the job).

The format of the data shipped within a notification of rendezvous completion is the "level n data" with n >= 1 (e.g. the aggregated data at level n).

Example:

  • process data :== byteCount SPACE bytes
  • subjob data :== processCount SPACE (process data)*
  • multijob data :== subjobCount SPACE (subjob data)*
  • ...

Implementing Rendezvous clients

Implementing a Rendezvous client (for instance a program executed as a computational job via GRAM and that needs to use GRAM built-in Rendezvous capabilities) implies the coding of remote calls from the client to the Rendezvous service/resource pairs via local calls to stubs generated by the tooling. Two kinds of calls must be performed:
  1. subscribe to the rendezvous for notifications:
  2. register data with the rendezvous:

In C

In Java

Integration with GRAM

Writing job applications hat use the Rendezvous features

GRAM creates a rendezvous resource for every managed job. In fact, the current implementation merges the rendezvous resource and the GRAM job resource. The job application can therefore use the contact information for the GRAM job Web service in order to perform remote rendezvous registration. There is no separate service-resource pair to talk to.

There are several environment variables that the job process can access in order to obtain information about the job structure and the GRAM job service-resource remote references:

  • $GLOBUS_GRAM_JOB_HANDLE: this is the handle to the service-resource pair of the GRAM job that started the process accessing the environment variable.
  • $GLOBUS_GRAM_MULTIJOB_HANDLE: this is the handle to the GRAM multijob service-resource pair, if the GRAM job was defined as is being executed as part of a multijob.
  • $GLOBUS_GRAM_SUBJOB_RANK: this is the rank of the GRAM job as a subjob within the multijob. This corresponds to the order of the subjob in the multijob description submitted to GRAM. The value is an integer between 0 and n-1 where n is the number of subjobs in the multijob (with n greater or equal to 1).

Note: a "handle" is a scalar version of an endpoint reference, containing only the URL of the service and the resource ID of the resource. The format of a handle is as such:

handle :== serviceURL ? resource ID

Application to MPI jobs

Acquisition of the contact data for the processes in a job is different than the actual interprocess contact, which is always done through MPI. Interactions depend on the usage of native MPI or non-native MPI. There are two cases:

  • non-native MPI: process 0 uses the GRAM Rendezvous feature in order to do interprocess data exchange, i.e acquiring processes contact data.
  • native MPI: process 0 gets all siblings contact through native MPI. There is no need for a subjob-level rendezvous her. Only a multijob rendezvous is used (if there is a multijob).