GraphLab: Distributed Graph-Parallel API
2.1
|
Provides a class with its own distributed communication context, allowing instances of the class to communicate with other remote instances. More...
#include <graphlab/rpc/dc_dist_object.hpp>
Public Member Functions | |
dc_dist_object (distributed_control &dc_, T *owner) | |
Constructs a distributed object context. | |
size_t | calls_received () const |
The number of function calls received by this object. | |
size_t | calls_sent () const |
The number of function calls sent from this object. | |
size_t | bytes_sent () const |
The number of bytes sent from this object, excluding headers and other control overhead. | |
distributed_control & | dc () |
A reference to the underlying distributed_control object. | |
const distributed_control & | dc () const |
A const reference to the underlying distributed_control object. | |
procid_t | procid () const |
The current process ID. | |
procid_t | numprocs () const |
The number of processes in the distributed program. | |
std::ostream & | cout () const |
A wrapper on cout, that outputs only on machine 0. | |
std::ostream & | cerr () const |
A wrapper on cerr, that outputs only on machine 0. | |
void | remote_call (procid_t targetmachine, Fn fn,...) |
Performs a non-blocking RPC call to the target machine to run the provided function pointer. | |
void | remote_call (Iterator machine_begin, Iterator machine_end, Fn fn,...) |
Performs a non-blocking RPC call to a collection of machines to run the provided function pointer. | |
RetVal | remote_request (procid_t targetmachine, Fn fn,...) |
Performs a blocking RPC call to the target machine to run the provided function pointer. | |
RetVal | future_remote_request (procid_t targetmachine, Fn fn,...) |
Performs a nonblocking RPC call to the target machine to run the provided function pointer which has an expected return value. | |
template<typename U > | |
void | send_to (procid_t target, U &t, bool control=false) |
Sends an object to a target machine and blocks until the target machine calls recv_from() to receive the object. | |
template<typename U > | |
void | recv_from (procid_t source, U &t, bool control=false) |
Waits to receives an object a source machine sent via send_to() | |
template<typename U > | |
void | broadcast (U &data, bool originator, bool control=false) |
This function allows one machine to broadcasts an object to all machines. | |
template<typename U > | |
void | gather (std::vector< U > &data, procid_t sendto, bool control=false) |
Collects information contributed by each machine onto one machine. | |
template<typename U > | |
void | all_gather (std::vector< U > &data, bool control=false) |
Sends some information contributed by each machine to all machines. | |
template<typename U , typename PlusEqual > | |
void | all_reduce2 (U &data, PlusEqual plusequal, bool control=false) |
Combines a value contributed by each machine, making the result available to all machines. | |
template<typename U > | |
void | all_reduce (U &data, bool control=false) |
Combines a value contributed by each machine, making the result available to all machines. | |
template<typename U > | |
void | all_to_all (std::vector< U > &data, bool control=false) |
void | barrier () |
A distributed barrier which waits for all machines to call the barrier() function before proceeding. | |
void | full_barrier () |
A distributed barrier which waits for all machines to call the full_barrier() function before proceeding. Also waits for all previously issued remote calls to complete. | |
std::map< std::string, size_t > | gather_statistics () |
Provides a class with its own distributed communication context, allowing instances of the class to communicate with other remote instances.
The philosophy behind the dc_dist_object is the concept of "distributed objects". The idea is that the user should be able to write code:
where if run in a distributed setting, the "vec" variable, can behave as if it is a single distributed object, and automatically coordinate its operations across the network; communicating with the other instances of "vec" on the other machines. Essentially, each object (vec, vec2 and g) constructs its own private communication context, which allows every machine's "vec" variable to communicate only with other machine's "vec" variable. And similarly for "vec2" and "g". This private communication context is provided by this dc_dist_object class.
To construct a distributed object requires little work:
After which remote_call(), and remote_request() can be used to communicate across the network with the same matching instance of the distributed_int_vector.
Each dc_dist_object maintains its own private communication context which is not influences by other communication contexts. In other words, the rmi.barrier()
, and all other operations in each instance of the distributed_int_vector are independent of each other. In particular, the rmi.full_barrier()
only waits for completion of all RPC calls from within the current communication context.
See the examples in GraphLab RPC for more usage examples.
Definition at line 115 of file dc_dist_object.hpp.
|
inline |
Constructs a distributed object context.
The constructor constructs a distributed object context which is associated with the "owner" object.
dc_ | The root distributed_control which provides the communication control plane. |
owner | The object to associate with |
Definition at line 199 of file dc_dist_object.hpp.
|
inline |
Sends some information contributed by each machine to all machines.
The goal is to have each machine broadcast a piece of information to all machines. This is like gather(), but all machines have the complete vector at the end. To accomplish this, each machine constructs a vector of length numprocs(), and stores the data to communicate in the procid()'th entry in the vector. Then calling all_gather with the vector will result in all machines having a complete copy of the vector containing all contributions (entry 0 from machine 0, entry 1 from machine 1, etc).Example:
data | A vector of length equal to the number of processes. The information to communicate is in the entry data[procid()] |
control | Optional parameter. Defaults to false. If set to true, this will marked as control plane communication and will not register in bytes_received() or bytes_sent(). This must be the same on all machines. |
Definition at line 1000 of file dc_dist_object.hpp.
|
inline |
Combines a value contributed by each machine, making the result available to all machines.
Each machine calls all_reduce() with a object which is serializable and has operator+= implemented. When all_reduce() returns, the "data" variable will contain a value corresponding to adding up the objects contributed by each machine.Example:
data | A piece of data to perform a reduction over. The type must implement operator+=. |
control | Optional parameter. Defaults to false. If set to true, this will marked as control plane communication and will not register in bytes_received() or bytes_sent(). This must be the same on all machines. |
Definition at line 1243 of file dc_dist_object.hpp.
|
inline |
Combines a value contributed by each machine, making the result available to all machines.
This function is equivalent to all_reduce(), but with an externally defined PlusEqual function.Each machine calls all_reduce() with a object which is serializable and a function "plusequal" which combines two instances of the object. When all_reduce2() returns, the "data" variable will contain a value corresponding to adding up the objects contributed by each machine using the plusequal function.Where U is the type of the object, the plusequal function must be of the form:
and must implement the equivalent of left += right;
Example:
data | A piece of data to perform a reduction over. |
plusequal | A plusequal function on the data. Must have the prototype void plusequal(U&, const U&) |
control | Optional parameter. Defaults to false. If set to true, this will marked as control plane communication and will not register in bytes_received() or bytes_sent(). This must be the same on all machines. |
Definition at line 1132 of file dc_dist_object.hpp.
|
inline |
A distributed barrier which waits for all machines to call the barrier() function before proceeding.
A machine calling the barrier() will wait until every machine reaches this barrier before continuing. Only one thread from each machine should call the barrier.
Definition at line 1358 of file dc_dist_object.hpp.
|
inline |
This function allows one machine to broadcasts an object to all machines.
The originator calls broadcast with data provided in in 'data' and originator set to true. All other callers call with originator set to false.The originator will then return 'data'. All other machines will receive the originator's transmission in the "data" parameter.This call is guaranteed to have barrier-like behavior. That is to say, this call will block until all machines enter the broadcast function.Example:
data | If this is the originator, this will contain the object to broadcast. Otherwise, this will be a reference to the object receiving the broadcast. |
originator | Set to true if this is the source of the broadcast. Set to false otherwise. |
control | Optional parameter. Defaults to false. If set to true, this will marked as control plane communication and will not register in bytes_received() or bytes_sent(). This must be the same on all machines. |
Definition at line 826 of file dc_dist_object.hpp.
|
inline |
A distributed barrier which waits for all machines to call the full_barrier() function before proceeding. Also waits for all previously issued remote calls to complete.
Similar to the barrier(), but provides additional guarantees that all calls issued prior to this barrier are completed before i returning.
This barrier ensures globally across all machines that all calls issued prior to this barrier are completed before returning. This function could return prematurely if other threads are still issuing function calls since we cannot differentiate between calls issued before the barrier and calls issued while the barrier is being evaluated.
Definition at line 1428 of file dc_dist_object.hpp.
RetVal graphlab::dc_dist_object< T >::future_remote_request | ( | procid_t | targetmachine, |
Fn | fn, | ||
... | |||
) |
Performs a nonblocking RPC call to the target machine to run the provided function pointer which has an expected return value.
future_remote_request() calls the function "fn" on a target remote machine. Provided arguments are serialized and sent to the target. Therefore, all arguments are necessarily transmitted by value. If the target function has a return value, it is sent back to calling machine.
future_remote_request() is like remote_request(), but is non-blocking. Instead, it returns immediately a graphlab::request_future object which will allow you wait for the return value.
Example:
targetmachine | The ID of the machine to run the function on |
fn | The function to run on the target machine. Must be a pointer to member function in the owning object. |
... | The arguments to send to Fn. Arguments must be serializable. and must be castable to the target types. |
|
inline |
Collects information contributed by each machine onto one machine.
The goal is to collect some information from each machine onto a single target machine (sendto). To accomplish this, each machine constructs a vector of length numprocs(), and stores the data to communicate in the procid()'th entry in the vector. Then calling gather with the vector and the target machine will send the contributed value to the target. When the function returns, machine sendto will have the complete vector where data[i] is the data contributed by machine i.Example:
data | A vector of length equal to the number of processes. The information to communicate is in the entry data[procid()] |
sendto | Machine which will hold the complete vector at the end of the operation. All machines must have the same value for this parameter. |
control | Optional parameter. Defaults to false. If set to true, this will marked as control plane communication and will not register in bytes_received() or bytes_sent(). This must be the same on all machines. |
Definition at line 885 of file dc_dist_object.hpp.
|
inline |
Gather RPC statistics. All machines must call this function at the same time. However, only proc 0 will return values
Definition at line 1488 of file dc_dist_object.hpp.
|
inline |
Waits to receives an object a source machine sent via send_to()
This function waits to receives a Serializable object "t" from a source machine. The source machine must send the object using send_to(). The source machine will wait for the target machine's recv_from() to complete before returning.Example:
U | the type of object to receive. This should be inferred by the compiler. |
source | The target machine to receive from. This function will block until data is received. |
t | The object to receive. It must be serializable and the type must match the source machine's call to send_to() |
control | Optional parameter. Defaults to false. If set to true, this will marked as control plane communication and will not register in bytes_received() or bytes_sent(). This must match the "control" parameter on the source machine's send_to() call. |
Definition at line 773 of file dc_dist_object.hpp.
void graphlab::dc_dist_object< T >::remote_call | ( | procid_t | targetmachine, |
Fn | fn, | ||
... | |||
) |
Performs a non-blocking RPC call to the target machine to run the provided function pointer.
remote_call() calls the function "fn" on a target remote machine. "fn" may be public, private or protected within the owner class; there are no access restrictions. Provided arguments are serialized and sent to the target. Therefore, all arguments are necessarily transmitted by value. If the target function has a return value, the return value is lost.
remote_call() is non-blocking and does not wait for the target machine to complete execution of the function. Different remote_calls may be handled by different threads on the target machine and thus the target function should be made thread-safe. Alternatively, see distributed_control::set_sequentialization_key() to force sequentialization of groups of remote calls.
If blocking operation is desired, remote_request() may be used. Alternatively, a full_barrier() may also be used to wait for completion of all incomplete RPC calls.
Example:
Note the syntax for obtaining a pointer to a member function.
targetmachine | The ID of the machine to run the function on |
fn | The function to run on the target machine. Must be a pointer to member function in the owning object. |
... | The arguments to send to Fn. Arguments must be serializable. and must be castable to the target types. |
void graphlab::dc_dist_object< T >::remote_call | ( | Iterator | machine_begin, |
Iterator | machine_end, | ||
Fn | fn, | ||
... | |||
) |
Performs a non-blocking RPC call to a collection of machines to run the provided function pointer.
This function calls the provided function pointer on a collection of machines contained in the iterator range [begin, end). Provided arguments are serialized and sent to the target. Therefore, all arguments are necessarily transmitted by value. If the target function has a return value, the return value is lost.
This function is functionally equivalent to:
However, this function makes some optimizations to ensure all arguments are only serialized once instead of #calls times.
This function is non-blocking and does not wait for the target machines to complete execution of the function. Different remote_calls may be handled by different threads on the target machines and thus the target function should be made thread-safe. Alternatively, see distributed_control::set_sequentialization_key() to force sequentialization of groups of remote_calls. A full_barrier() may also be issued to wait for completion of all RPC calls issued prior to the full barrier.
Example:
machine_begin | The beginning of an iterator range containing a list machines to call. Iterator::value_type must be castable to procid_t. |
machine_end | The end of an iterator range containing a list machines to call. Iterator::value_type must be castable to procid_t. |
fn | The function to run on the target machine. Must be a pointer to member function in the owning object. |
... | The arguments to send to Fn. Arguments must be serializable. and must be castable to the target types. |
RetVal graphlab::dc_dist_object< T >::remote_request | ( | procid_t | targetmachine, |
Fn | fn, | ||
... | |||
) |
Performs a blocking RPC call to the target machine to run the provided function pointer.
remote_request() calls the function "fn" on a target remote machine. Provided arguments are serialized and sent to the target. Therefore, all arguments are necessarily transmitted by value. If the target function has a return value, it is sent back to calling machine.
Unlike remote_call(), remote_request() is blocking and waits for the target machine to complete execution of the function. However, different remote_requests may be still be handled by different threads on the target machine.
Example:
targetmachine | The ID of the machine to run the function on |
fn | The function to run on the target machine. Must be a pointer to member function in the owning object. |
... | The arguments to send to Fn. Arguments must be serializable. and must be castable to the target types. |
|
inline |
Sends an object to a target machine and blocks until the target machine calls recv_from() to receive the object.
This function sends a Serializable object "t" to the target machine, but waits for the target machine to call recv_from() before returning to receive the object before returning.Example:
U | the type of object to send. This should be inferred by the compiler. |
target | The target machine to send to. Target machine must call recv_from() before this call will return. |
t | The object to send. It must be serializable. The type must match the target machine's call to recv_from() |
control | Optional parameter. Defaults to false. If set to true, this will marked as control plane communication and will not register in bytes_received() or bytes_sent(). This must match the "control" parameter on the target machine's recv_from() call. |
Definition at line 744 of file dc_dist_object.hpp.