GraphLab: Distributed Graph-Parallel API  2.1
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Macros Groups Pages
graphlab::async_consensus Class Reference

This implements a distributed consensus algorithm which waits for global completion of all computation/RPC events on a given object. More...

#include <graphlab/rpc/async_consensus.hpp>

List of all members.

Public Member Functions

 async_consensus (distributed_control &dc, size_t required_threads_in_done=1, const dc_impl::dc_dist_object_base *attach=NULL)
 Constructs an asynchronous consensus object.
void begin_done_critical_section (size_t cpuid)
 A thread enters the critical section by calling this function.
void cancel_critical_section (size_t cpuid)
 Leaves the critical section because termination condition is not fullfilled.
bool end_done_critical_section (size_t cpuid)
 Thread must call this function if termination condition is fullfilled while in the critical section.
void force_done ()
 Forces the consensus to be set.
void cancel ()
 Wakes up all local threads waiting in done()
void cancel_one (size_t cpuid)
 Wakes up a specific thread waiting in done()
bool is_done () const
 Returns true if consensus is achieved.
void reset ()
 Resets the consensus object. Must be called simultaneously by exactly one thread on each machine. This function is not safe to call while consensus is being achieved.

Detailed Description

This implements a distributed consensus algorithm which waits for global completion of all computation/RPC events on a given object.

The use case is as follows:

A collection of threads on a collection of distributed machines, each running the following

while (work to be done) {
Do_stuff
}

However, Do_stuff will include RPC calls which may introduce work to other threads/machines. Figuring out when termination is possible is complex. For instance RPC calls could be in-flight and not yet received. This async_consensus class implements a solution built around the algorithm in Misra, J.: Detecting Termination of Distributed Computations Using Markers, SIGOPS, 1983 extended to handle the mixed parallelism (distributed with threading) case.

The main loop of the user has to be modified to:

done = false;
while (!done) {
Do_stuff
// if locally, I see that there is no work to be done
// we begin the consensus checking
if (no work to be done) {
// begin the critical section and try again
consensus.begin_done_critical_section();
// if still no work to be done
if (no work to be done) {
done = consensus.end_done_critical_section();
}
else {
consensus.cancel_critical_section();
}
}
}

Additionally, incoming RPC calls which create work must ensure there are active threads which are capable of processing the work. An easy solution will be to simply cancel_one(). Other more optimized solutions include keeping a counter of the number of active threads, and only calling cancel() or cancel_one() if all threads are asleep. (Note that the optimized solution does require some care to ensure dead-lock free execution).

Definition at line 89 of file async_consensus.hpp.


Constructor & Destructor Documentation

graphlab::async_consensus::async_consensus ( distributed_control dc,
size_t  required_threads_in_done = 1,
const dc_impl::dc_dist_object_base *  attach = NULL 
)

Constructs an asynchronous consensus object.

The consensus procedure waits till all threads have no work to do and are waiting in consensus, and there is no communication going on which could wake up a thread. The consensus object is therefore associated with a communication context, either a graphlab::dc_dist_object, or the global context (the root distributed_control).

Parameters:
dcThe distributed control object to use for communication
required_threads_in_donelocal consensus is achieved if this many threads are waiting for consensus locally.
attachThe context to associate with. If NULL, we associate with the global context.

Definition at line 27 of file async_consensus.cpp.


Member Function Documentation

void graphlab::async_consensus::begin_done_critical_section ( size_t  cpuid)

A thread enters the critical section by calling this function.

After which it should check its termination condition locally. If termination condition is still fulfilled, end_done_critical_section() should be called. Otherwise cancel_critical_section() should be called.

Parameters:
cpuidThread ID of the thread.

Definition at line 67 of file async_consensus.cpp.

void graphlab::async_consensus::cancel_critical_section ( size_t  cpuid)

Leaves the critical section because termination condition is not fullfilled.

See begin_done_critical_section()

Parameters:
cpuidThread ID of the thread.

Definition at line 74 of file async_consensus.cpp.

bool graphlab::async_consensus::end_done_critical_section ( size_t  cpuid)

Thread must call this function if termination condition is fullfilled while in the critical section.

See begin_done_critical_section()

Parameters:
cpuidThread ID of the thread.
Returns:
True if global consensus is achieved. False otherwise.

Definition at line 80 of file async_consensus.cpp.


The documentation for this class was generated from the following files: