Fault Tolerant Real-time Event Service

Introduction

The Fault Tolerant Real-time Event Service provides the fault tolerant capability to the TAO Real-time Event Service. Essentially, it allows you to start several event services in different machines. These event services form an object group which can be treated as a logical event service by clients. The clients only communicate with the primary (leader) of the object group. The rest event channels in the object group are called backups. Once the primary dies, one of the backups will assume the reponsibility of primary and this process is transparent to the clients.

The key to provide fault tolerance to event channels is to replicate the states of primary to its backups. There are two kinds of states in the event channels, transient and persistent. A transient state in the event channels is the events which are yet to be delivered to the consumers. Those events are hard to replicate because the time scale is too small. They might be delivered late or out of scope if we tried to replicate the events. However, the subscription information occurs at a suitable time scale for replication, and is in fact more essential for the delivery of events since it establishes a kind of connectivity from suppliers to consumers. Therefore, we only replicate the subscriptions

Once the primary receives the subscription requests from the clients, it will replicate the requests to the backup event channels.In order to provide time bounds on replication, we introduce the concept of transaction depth. If we say the transaction depth is n, that means a subscription method invocation has to be blocked until the first n replicas complete the subscription operation, illustrated by the assured-replicate arc in the figure. Other replicas can get the state change via a so called soft-replicate which conceptually means the replication is not assured to complete before the subscription operation returns.  So, if the soft-replicate fails due to loss of the primary, we will have only the assured depth of replication. The clients are allowed to configure the transaction depth to tradeoff reliability and responsiveness. Furthermore, it is necessary to roll back an operation in some replicas if the transaction depth can not be met. In addition, we can use either two-way or AMI calls for assured-replication and one-way operations for soft-replication.

Important Note : In current stage, the Fault Tolerant Event Service can only be made under MPC build. The conventional makefiles are yet to be supported. In other words, you should use $ACE_ROOT/bin/mwc.pl to generate makefiles for ACE and TAO before you can build it. See here for the instruction of using mwc.pl.
 

Programs

There are serveral programs in $TAO_ROOT/orbsvcs/FTRT_Event_Servce directory:

ftrt_eventservice : Located under $TAO_ROOT/orbsvcs/FTRT_Event_Servce/Event_Service directory. It implements the functionality of fault tolerant event channel. It can be started directly or  be started by the ftrtec_factory_service.

ftrtec_factory_service : Located under $TAO_ROOT/orbsvcs/FTRT_Event_Servce/Factory_Service directory. It is a program used to spawn the ftrt_eventservice process. The process it create can be controled through "test.cfg" whose contents should begin with the repository id of EventChannel followed by the executable path of ftrt_eventservice.

ftrtec_gateway_service : Located under $TAO_ROOT/orbsvcs/FTRT_Event_Servce/Gateway_Service directory. It is an intermediator program between the ftrt_eventservice and the clients which do not support FT CORBA.

consumer : A shell script to start the consumer test program. The actual consumer pragram is in $TAO_ROOT/orbsvcs/tests/FtRtEvent.

supplier : A shell script to start the supplier test program. The actual supplier pragram is in $TAO_ROOT/orbsvcs/orbsvcs/tests/FtRtEvent.

ftec : a shell script to start ftrt_eventservice.
 

Quick start:

  Run the applications as follows:
 

  1. Start Naming_Service


   $ $TAO_ROOT/orbsvcs/Naming_Service/Naming_Service -m 1
     or you can use the shell script NameService in this directory to start it.

  2. Start the ftrt_eventservice. Use the "-p" option to start it as a primary and
     use the "-j" option to start it as a backup.

  $ cd $TAO_ROOT/orbsvcs/FTRT_Event_Service
  $ ./ftec -p
  $ ./ftec -j
  $ ./ftec -j
 

  3. Start the consumer and supplier.

  $ ./consumer
  $ ./supplier
 

How do we add a new FTRTEC to the system?

  Just use

  ./ftec -j

  The newly created process will contact to the naming service and then join to
  the existing object group.
 

Is there any adjustable options for FTRTEC?

  Here is the list of options for the ftec script

  -sciop           Use SCIOP for CORBA communication
  -sctp            Use SCTP for fault detection
  -hb n            Specify the heart beat interval in sec
                   for SCTP connection, this option also activate sctp option.
  -ami             Use AMI call for replication messages (The default is
                   two-way CORBA call for replication)
  -p               activate as a primary replica.
  -j               activate as a backup replica.
 

 Below are some options that are used for the consumer and supplier
    test scripts.

  -sciop          Use SCIOP for CORBA communication. This requires that the Naming
                  Service and ftec are also started using SCIOP transport protocol.

  -d n            Specify the transaction depth. The transaction depth indicates the
                  number of replicas that must complete the subscription request before
                  the request can return.

  -t f.f          For supplier only. Specify the time interval between event sending
                  in seconds, this value should be a float point.

 If you the naming service are not running at the same machine with above programs, you can always set the environmental variables NameServiceIOR before starting the ftec, consumer or supplier.
 

How do I start the FTRTEC using ftrtec_factory_service?

The ftrtec_factory_service is a small program that can instaniate a ftrt_eventservice on demand. It exports the FT::GenericFactory interface to its client. There are two ways that  you can get the IOR for the factory object. 1) specify the name you want the factory  register to the naming service and then get the IOR from the naming service by the name. 2) output the IOR to a file when the factory starts. Here are the options

    ftrtec_factory_service :

      -i id_string          The id field of the name that is used to register to the naming service
      -k kind_string        The kind field of the name that is used to register to the naming service
      -o output_filename    The output file name for the factory IOR.

Once you get the IOR for the factory, you can use create_object to intantiate the ftrt_eventservice.
Here are the parameters in create_object() to control how ftrt_eventservice is created.

   type_id : this value should be "IDL:FtRtecEventChannelAdmin/EventChannel:1.0"
   the_criteria : the_criteria is a sequence of Property which in term consists of "nam" and "value". Below a a list of possible nam and values.
 
 
nam value
FTEC_MEMBERSHIP PRIMARY
BACKUP
NONE
FTEC_DETECTOR_TRANSPORT_PROTOCL TCP
SCTP
FTEC_HEART_BEAT the heart beat value in sec. (Note, you have to specify it using string, i.e. the_criteria[0].value <<= "5");
FTEC_REPLICATION_STRATEGY AMI
(If not specified, the ftrt_eventservice use default two-way call for replication)
 
NameServieIOR the corbaloc representation for the naming service
 

 Any nam string started with "-" will be used as a command line option to start ftrt_eventservice. For example, if you specfiy the name as "-ORBEndpoint" and value as "sciop://" then the ftrt_eventservice can be started using sciop.
 

How do I use the ftrtec_gateway_service program ?

The FTRTEC uses some features in FT CORBA that requires every client to use FT ORB to work. If your  client is written based other ORBs other than TAO. You cannot get the desired fault tolerance feature. In this case you can have the ftec_gateway as an intermediator between the FTRTEC and you client program.
For example, if you have an existing client called my_supplier.

    # setting up the event channel group as previously stated.

    $ftrtec_gateway_service -o gateway.ior        ## start the gateway and output the IOR of the gateway to a file
    $my_supplier -i file://gateway.ior  ## start the supplier using the gateway