GraphLab: Distributed Graph-Parallel API
2.1
|
This implements a distributed consensus algorithm which waits for global completion of all computation/RPC events on a given object. More...
#include <graphlab/rpc/async_consensus.hpp>
Public Member Functions | |
async_consensus (distributed_control &dc, size_t required_threads_in_done=1, const dc_impl::dc_dist_object_base *attach=NULL) | |
Constructs an asynchronous consensus object. | |
void | begin_done_critical_section (size_t cpuid) |
A thread enters the critical section by calling this function. | |
void | cancel_critical_section (size_t cpuid) |
Leaves the critical section because termination condition is not fullfilled. | |
bool | end_done_critical_section (size_t cpuid) |
Thread must call this function if termination condition is fullfilled while in the critical section. | |
void | force_done () |
Forces the consensus to be set. | |
void | cancel () |
Wakes up all local threads waiting in done() | |
void | cancel_one (size_t cpuid) |
Wakes up a specific thread waiting in done() | |
bool | is_done () const |
Returns true if consensus is achieved. | |
void | reset () |
Resets the consensus object. Must be called simultaneously by exactly one thread on each machine. This function is not safe to call while consensus is being achieved. |
This implements a distributed consensus algorithm which waits for global completion of all computation/RPC events on a given object.
The use case is as follows:
A collection of threads on a collection of distributed machines, each running the following
However, Do_stuff
will include RPC calls which may introduce work to other threads/machines. Figuring out when termination is possible is complex. For instance RPC calls could be in-flight and not yet received. This async_consensus class implements a solution built around the algorithm in Misra, J.: Detecting Termination of Distributed Computations Using Markers, SIGOPS, 1983 extended to handle the mixed parallelism (distributed with threading) case.
The main loop of the user has to be modified to:
Additionally, incoming RPC calls which create work must ensure there are active threads which are capable of processing the work. An easy solution will be to simply cancel_one(). Other more optimized solutions include keeping a counter of the number of active threads, and only calling cancel() or cancel_one() if all threads are asleep. (Note that the optimized solution does require some care to ensure dead-lock free execution).
Definition at line 89 of file async_consensus.hpp.
graphlab::async_consensus::async_consensus | ( | distributed_control & | dc, |
size_t | required_threads_in_done = 1 , |
||
const dc_impl::dc_dist_object_base * | attach = NULL |
||
) |
Constructs an asynchronous consensus object.
The consensus procedure waits till all threads have no work to do and are waiting in consensus, and there is no communication going on which could wake up a thread. The consensus object is therefore associated with a communication context, either a graphlab::dc_dist_object, or the global context (the root distributed_control).
dc | The distributed control object to use for communication |
required_threads_in_done | local consensus is achieved if this many threads are waiting for consensus locally. |
attach | The context to associate with. If NULL, we associate with the global context. |
Definition at line 27 of file async_consensus.cpp.
void graphlab::async_consensus::begin_done_critical_section | ( | size_t | cpuid | ) |
A thread enters the critical section by calling this function.
After which it should check its termination condition locally. If termination condition is still fulfilled, end_done_critical_section() should be called. Otherwise cancel_critical_section() should be called.
cpuid | Thread ID of the thread. |
Definition at line 67 of file async_consensus.cpp.
void graphlab::async_consensus::cancel_critical_section | ( | size_t | cpuid | ) |
Leaves the critical section because termination condition is not fullfilled.
See begin_done_critical_section()
cpuid | Thread ID of the thread. |
Definition at line 74 of file async_consensus.cpp.
bool graphlab::async_consensus::end_done_critical_section | ( | size_t | cpuid | ) |
Thread must call this function if termination condition is fullfilled while in the critical section.
See begin_done_critical_section()
cpuid | Thread ID of the thread. |
Definition at line 80 of file async_consensus.cpp.