GraphLab: Distributed Graph-Parallel API
2.1
|
The omni engine encapsulates all the GraphLab engines allowing the user to select which engine to use at runtime. More...
#include <graphlab/engine/omni_engine.hpp>
Public Types | |
typedef iengine< VertexProgram > | iengine_type |
The type of the iengine. | |
typedef VertexProgram | vertex_program_type |
The user defined vertex program type which should extend ivertex_program. | |
typedef vertex_program_type::message_type | message_type |
The user defined message type which is defined in ivertex_program::message_type. | |
typedef vertex_program_type::graph_type | graph_type |
The graph type which is defined in ivertex_program::graph_type and will typically be distributed_graph. | |
typedef VertexProgram::gather_type | gather_type |
The user defined type returned by the gather function. | |
typedef graph_type::vertex_id_type | vertex_id_type |
The vertex identifier type defined in graphlab::vertex_id_type. | |
typedef iengine_type::aggregator_type | aggregator_type |
The type of the distributed aggregator used by each engine to implement distributed aggregation. | |
typedef synchronous_engine < VertexProgram > | synchronous_engine_type |
the type of synchronous engine | |
typedef async_consistent_engine < VertexProgram > | async_consistent_engine_type |
the type of asynchronous engine | |
typedef semi_synchronous_engine < VertexProgram > | semi_synchronous_engine_type |
the type of asynchronous engine | |
typedef graph_type::vertex_type | vertex_type |
the vertex object type which contains a reference to the vertex data and is defined in the iengine::graph_type (see for example distributed_graph::vertex_type). | |
typedef graph_type::edge_type | edge_type |
the edge object type which contains a reference to the edge data and is defined in the iengine::graph_type (see for example distributed_graph::edge_type). | |
typedef vertex_program_type::icontext_type | icontext_type |
The context type which is passed into vertex programs as a callback to the engine. |
Public Member Functions | |
omni_engine (distributed_control &dc, graph_type &graph, const std::string &default_engine_type, const graphlab_options &options=graphlab_options()) | |
Construct an omni engine for a given graph with the default_engine_type unless the engine options contain an alternative type. | |
~omni_engine () | |
Destroy the internal engine destroying all vertex programs associated with this engine. | |
execution_status::status_enum | start () |
Start the engine execution. | |
size_t | num_updates () const |
Compute the total number of updates (calls to apply) executed since start was last invoked. | |
float | elapsed_seconds () const |
Get the elapsed time in seconds since start was last called. | |
int | iteration () const |
get the current iteration number. This is not defined for all engines in which case -1 is returned. | |
void | signal (vertex_id_type vertex, const message_type &message=message_type()) |
Signals single a vertex with an optional message. | |
void | signal_all (const message_type &message=message_type(), const std::string &order="shuffle") |
Signal all vertices with a particular message. | |
void | signal_vset (const vertex_set &vset, const message_type &message=message_type(), const std::string &order="shuffle") |
Signal a set of vertices with a particular message. | |
aggregator_type * | get_aggregator () |
template<typename ReductionType , typename VertexMapType , typename FinalizerType > | |
bool | add_vertex_aggregator (const std::string &key, VertexMapType map_function, FinalizerType finalize_function) |
Creates a vertex aggregator. Returns true on success. Returns false if an aggregator of the same name already exists. | |
template<typename ReductionType , typename EdgeMapType , typename FinalizerType > | |
bool | add_edge_aggregator (const std::string &key, EdgeMapType map_function, FinalizerType finalize_function) |
Creates an edge aggregator. Returns true on success. Returns false if an aggregator of the same name already exists. | |
bool | aggregate_now (const std::string &key) |
Performs an immediate aggregation on a key. | |
template<typename ReductionType , typename VertexMapperType > | |
ReductionType | map_reduce_vertices (VertexMapperType mapfunction) |
Performs a map-reduce operation on each vertex in the graph returning the result. | |
template<typename ReductionType , typename EdgeMapperType > | |
ReductionType | map_reduce_edges (EdgeMapperType mapfunction) |
Performs a map-reduce operation on each edge in the graph returning the result. | |
template<typename VertexMapperType > | |
void | transform_vertices (VertexMapperType mapfunction) |
Performs a transformation operation on each vertex in the graph. | |
template<typename EdgeMapperType > | |
void | transform_edges (EdgeMapperType mapfunction) |
Performs a transformation operation on each edge in the graph. | |
bool | aggregate_periodic (const std::string &key, float seconds) |
Requests that a particular aggregation key be recomputed periodically when the engine is running. |
The omni engine encapsulates all the GraphLab engines allowing the user to select which engine to use at runtime.
The actual engine type is set as a string argument to the constructor of the omni_engine. Forexample:
The specific engine type can be overriden by command line arguments (engine_opts="type=<type>"):
then calling the progam with the command line options:
%> mpiexec -n 16 ./pagerank --engine_opts="type=synchronous"
The currently supproted types are:
Definition at line 84 of file omni_engine.hpp.
typedef VertexProgram::gather_type graphlab::omni_engine< VertexProgram >::gather_type |
The user defined type returned by the gather function.
The gather type is defined in the graphlab::ivertex_program interface and is the value returned by the graphlab::ivertex_program::gather function. The gather type must have an operator+=(const gather_type& other)
function and must be Serializable.
Definition at line 119 of file omni_engine.hpp.
|
inherited |
The context type which is passed into vertex programs as a callback to the engine.
Most engines use the graphlab::context implementation.
Reimplemented in graphlab::semi_synchronous_engine< VertexProgram >, graphlab::synchronous_engine< VertexProgram >, and graphlab::async_consistent_engine< VertexProgram >.
Definition at line 180 of file iengine.hpp.
|
inline |
Construct an omni engine for a given graph with the default_engine_type unless the engine options contain an alternative type.
[in] | dc | a distributed control object that is used to connect this engine with it's counter parts on other machines. |
[in,out] | graph | the graph object that this engine will transform. |
[in] | options | the command line options which are used to configure the engine. Note that the engine option "type" can be used to select the engine to use (synchronous or asynchronous). |
[in] | default_engine_type | The user must specify what engine type to use if no command line option is given. |
Definition at line 185 of file omni_engine.hpp.
|
inlineinherited |
Creates an edge aggregator. Returns true on success. Returns false if an aggregator of the same name already exists.
Creates a edge aggregator associated to a particular key. The map_function is called over every edge in the graph, and the return value of the map is summed. The finalize_function is then called on the result of the reduction. The finalize_function is called on all machines. The map_function should only read the graph data, and should not make any modifications.
For instance, if the graph has float vertex data, and float edge data:
An aggregator can be constructed to compute the absolute sum of all the edge data. To do this, we define two functions.
Next, we define the aggregator in the engine by calling add_edge_aggregator(). We must assign it a unique name which will be used to reference this particular aggregate operation. We shall call it "absolute_edge_sum".
When executed, the engine execute absolute_edge_data()
on each edge in the graph. absolute_edge_data()
reads the edge data, and returns its absolute value. All return values are then summing them together using the float's += operator. The final result is than passed to the print_finalize
function. The template argument <float>
is necessary to provide information about the return type of absolute_edge_data
.
This aggregator can be run immediately by calling aggregate_now() with the name of the aggregator.
Or can be arranged to run periodically together with the engine execution (in this example, every 1.5 seconds).
Note that since finalize is called on all machines, multiple copies of the total will be printed. If only one copy is desired, see context.cout() or to get the actual process ID using context.procid()
The add_edge_aggregator() function is also templatized over both function types and there is no strong enforcement of the exact argument types of the map function and the reduce function. For instance, in the above example, the following print_finalize() variants may also be accepted.
In particlar, the last variation may be useful for performance reasons if the reduction type is large.
To obtain consistent distributed behavior in the distributed setting, we designed the aggregator to minimize the amount of asymmetry among the machines. In particular, the finalize operation is guaranteed to be called on all machines. This therefore permits global variables to be modified on finalize since all machines are ensured to be eventually consistent.
For instance, in the above example, print_finalize could store the result in a global variable:
which will make it accessible to all other running update functions.
ReductionType | The output of the map function. Must have operator+= defined, and must be Serializable. |
EdgeMapperType | The type of the map function. Not generally needed. Can be inferred by the compiler. |
FinalizerType | The type of the finalize function. Not generally needed. Can be inferred by the compiler. |
[in] | key | The name of this aggregator. Must be unique. |
[in] | map_function | The Map function to use. Must take an icontext_type& as its first argument, and a edge_type, or a reference to a edge_type as its second argument. Returns a ReductionType which must be summable and Serializable . |
[in] | finalize_function | The Finalize function to use. Must take an icontext_type& as its first argument and a ReductionType, or a reference to a ReductionType as its second argument. |
Definition at line 692 of file iengine.hpp.
|
inlineinherited |
Creates a vertex aggregator. Returns true on success. Returns false if an aggregator of the same name already exists.
Creates a vertex aggregator associated to a particular key. The map_function is called over every vertex in the graph, and the return value of the map is summed. The finalize_function is then called on the result of the reduction. The finalize_function is called on all machines. The map_function should only read the graph data, and should not make any modifications.
For instance, if the graph has float vertex data, and float edge data:
An aggregator can be constructed to compute the absolute sum of all the vertex data. To do this, we define two functions.
Next, we define the aggregator in the engine by calling add_vertex_aggregator(). We must assign it a unique name which will be used to reference this particular aggregate operation. We shall call it "absolute_vertex_sum".
When executed, the engine execute absolute_vertex_data()
on each vertex in the graph. absolute_vertex_data()
reads the vertex data, and returns its absolute value. All return values are then summing them together using the float's += operator. The final result is than passed to the print_finalize
function. The template argument <float>
is necessary to provide information about the return type of absolute_vertex_data
.
This aggregator can be run immediately by calling aggregate_now() with the name of the aggregator.
Or can be arranged to run periodically together with the engine execution (in this example, every 1.5 seconds).
Note that since finalize is called on all machines, multiple copies of the total will be printed. If only one copy is desired, see context.cout() or to get the actual process ID using context.procid()
In practice, the reduction type can be any arbitrary user-defined type as long as a += operator is defined. This permits great flexibility in the type of operations the aggregator can perform.
The add_vertex_aggregator() function is also templatized over both function types and there is no strong enforcement of the exact argument types of the map function and the reduce function. For instance, in the above example, the following print_finalize() variants may also be accepted.
In particlar, the last variation may be useful for performance reasons if the reduction type is large.
To obtain consistent distributed behavior in the distributed setting, we designed the aggregator to minimize the amount of asymmetry among the machines. In particular, the finalize operation is guaranteed to be called on all machines. This therefore permits global variables to be modified on finalize since all machines are ensured to be eventually consistent.
For instance, in the above example, print_finalize could store the result in a global variable:
which will make it accessible to all other running update functions.
ReductionType | The output of the map function. Must have operator+= defined, and must be Serializable. |
VertexMapperType | The type of the map function. Not generally needed. Can be inferred by the compiler. |
FinalizerType | The type of the finalize function. Not generally needed. Can be inferred by the compiler. |
[in] | map_function | The Map function to use. Must take an |
[in] | key | The name of this aggregator. Must be unique. icontext_type& as its first argument, and a vertex_type, or a reference to a vertex_type as its second argument. Returns a ReductionType which must be summable and Serializable . |
[in] | finalize_function | The Finalize function to use. Must take an icontext_type& as its first argument and a ReductionType, or a reference to a ReductionType as its second argument. |
Definition at line 484 of file iengine.hpp.
|
inlineinherited |
Performs an immediate aggregation on a key.
Performs an immediate aggregation on a key. All machines must call this simultaneously. If the key is not found, false is returned. Otherwise returns true on success.
For instance, the following code will run the aggregator with the name "absolute_vertex_sum" immediately.
[in] | key | Key to aggregate now. Must be a key previously created by add_vertex_aggregator() or add_edge_aggregator(). |
Definition at line 780 of file iengine.hpp.
|
inlineinherited |
Requests that a particular aggregation key be recomputed periodically when the engine is running.
Requests that the aggregator with a given key be aggregated every certain number of seconds when the engine is running. Note that the period is prescriptive: in practice the actual period will be larger than the requested period. Seconds must be >= 0;
For instance, the following code will schedule the aggregator with the name "absolute_vertex_sum" to run every 1.5 seconds.
[in] | key | Key to schedule. Must be a key previously created by add_vertex_aggregator() or add_edge_aggregator(). |
[in] | seconds | How frequently to schedule. Must be >= 0. seconds == 0 will ensure that this key is continously recomputed. |
All machines must call simultaneously.
Definition at line 1169 of file iengine.hpp.
|
inlinevirtual |
Get the elapsed time in seconds since start was last called.
Implements graphlab::iengine< VertexProgram >.
Definition at line 225 of file omni_engine.hpp.
|
inlinevirtual |
get the current iteration number. This is not defined for all engines in which case -1 is returned.
Reimplemented from graphlab::iengine< VertexProgram >.
Definition at line 226 of file omni_engine.hpp.
|
inlineinherited |
Performs a map-reduce operation on each edge in the graph returning the result.
Given a map function, map_reduce_edges() call the map function on all edges in the graph. The return values are then summed together and the final result returned. The map function should only read data and should not make any modifications. map_reduce_edges() must be called on all machines simultaneously.
For instance, if the graph has float vertex data, and float edge data:
To compute an absolute sum over all the edge data, we would write a function which reads in each a edge, and returns the absolute value of the data on the edge.
After which calling:
will call the absolute_edge_data()
function on each edge in the graph. absolute_edge_data()
reads the value of the edge and returns the absolute result. This return values are then summed together and returned. All machines see the same result.
The template argument <float>
is needed to inform the compiler regarding the return type of the mapfunction.
Another common use for the map_reduce_edges() function is in signalling. Since the map function is passed a context, it can be used to perform signalling of edges for execution during a later engine.start() call.
For instance, the following code will signal the source vertex of each edge.
Note that in this case, we are not interested in a reduction operation, and thus we return a graphlab::empty object. Calling:
will run signal_source()
on all edges, signalling all source vertices.
The map function has the same structure as that in add_edge_aggregator() and may be reused in an aggregator. This function is also very similar to graphlab::distributed_graph::map_reduce_edges() with the difference that this takes a context and thus can be used to perform signalling. Finally transform_edges() can be used to perform a similar but may also make modifications to graph data.
ReductionType | The output of the map function. Must have operator+= defined, and must be Serializable. |
EdgeMapperType | The type of the map function. Not generally needed. Can be inferred by the compiler. |
mapfunction | The map function to use. Must take an icontext_type& as its first argument, and a edge_type, or a reference to a edge_type as its second argument. Returns a ReductionType which must be summable and Serializable . |
Definition at line 974 of file iengine.hpp.
|
inlineinherited |
Performs a map-reduce operation on each vertex in the graph returning the result.
Given a map function, map_reduce_vertices() call the map function on all vertices in the graph. The return values are then summed together and the final result returned. The map function should only read the vertex data and should not make any modifications. map_reduce_vertices() must be called on all machines simultaneously.
For instance, if the graph has float vertex data, and float edge data:
To compute an absolute sum over all the vertex data, we would write a function which reads in each a vertex, and returns the absolute value of the data on the vertex.
After which calling:
will call the absolute_vertex_data()
function on each vertex in the graph. absolute_vertex_data()
reads the value of the vertex and returns the absolute result. This return values are then summed together and returned. All machines see the same result.
The template argument <float>
is needed to inform the compiler regarding the return type of the mapfunction.
Another common use for the map_reduce_vertices() function is in signalling. Since the map function is passed a context, it can be used to perform signalling of vertices for execution during a later engine.start() call.
For instance, the following code will signal all vertices with value >= 1
Note that in this case, we are not interested in a reduction operation, and thus we return a graphlab::empty object. Calling:
will run signal_vertices()
on all vertices, signalling all vertices with value <= 1
The map function has the same structure as that in add_vertex_aggregator() and may be reused in an aggregator. This function is also very similar to graphlab::distributed_graph::map_reduce_vertices() with the difference that this takes a context and thus can be used to perform signalling. Finally transform_vertices() can be used to perform a similar but may also make modifications to graph data.
ReductionType | The output of the map function. Must have operator+= defined, and must be Serializable. |
VertexMapperType | The type of the map function. Not generally needed. Can be inferred by the compiler. |
mapfunction | The map function to use. Must take an icontext_type& as its first argument, and a vertex_type, or a reference to a vertex_type as its second argument. Returns a ReductionType which must be summable and Serializable . |
Definition at line 876 of file iengine.hpp.
|
inlinevirtual |
Compute the total number of updates (calls to apply) executed since start was last invoked.
Implements graphlab::iengine< VertexProgram >.
Definition at line 224 of file omni_engine.hpp.
|
inlinevirtual |
Signals single a vertex with an optional message.
This function sends a message to particular vertex which will receive that message on start. The signal function must be invoked on all machines simultaneously. For example:
and not:
Since signal is executed synchronously on all machines it should only be used to schedule a small set of vertices. The preferred method to signal a large set of vertices (e.g., all vertices that are a certain type) is to use either the vertex program init function or the aggregation framework. For example to signal all vertices that have a particular value one could write:
[in] | vid | the vertex id to signal |
[in] | message | the message to send to that vertex. The default message is sent if no message is provided. (See ivertex_program::message_type for details about the message_type). |
Implements graphlab::iengine< VertexProgram >.
Definition at line 227 of file omni_engine.hpp.
|
inlinevirtual |
Signal all vertices with a particular message.
This function sends the same message to all vertices which will receive that message on start. The signal_all function must be invoked on all machines simultaneously. For example:
and not:
The signal_all function is the most common way to send messages to the engine. For example in the pagerank application we want all vertices to be active on the first round. Therefore we would write:
[in] | message | the message to send to all vertices. The default message is sent if no message is provided (See ivertex_program::message_type for details about the message_type). |
Implements graphlab::iengine< VertexProgram >.
Definition at line 231 of file omni_engine.hpp.
|
inlinevirtual |
Signal a set of vertices with a particular message.
This function sends the same message to a set of vertices which will receive that message on start. The signal_vset function must be invoked on all machines simultaneously. For example:
signal_all() is conceptually equivalent to:
[in] | vset | The set of vertices to signal |
[in] | message | the message to send to all vertices. The default message is sent if no message is provided (See ivertex_program::message_type for details about the message_type). |
Implements graphlab::iengine< VertexProgram >.
Definition at line 235 of file omni_engine.hpp.
|
inlinevirtual |
Start the engine execution.
Behavior details depend on the engine implementation. See the implementation documentation for specifics.
Implements graphlab::iengine< VertexProgram >.
Definition at line 222 of file omni_engine.hpp.
|
inlineinherited |
Performs a transformation operation on each edge in the graph.
Given a mapfunction, transform_edges() calls mapfunction on every edge in graph. The map function may make modifications to the data on the edge. transform_edges() must be called on all machines simultaneously.
For instance, if the graph has integer vertex data, and integer edge data:
To set each edge value to be the number of out-going edges of the target vertex, we may write the following:
Calling transform_edges():
will run the set_edge_value()
function on each edge in the graph, setting its new value.
Since the mapfunction is provided with a context, the mapfunction can also be used to perform signalling. For instance, the set_edge_value
function above may be modified to set the value of the edge, but to also signal the target vertex.
However, if the purpose of the function is to only signal without making modifications, map_reduce_edges() will be more efficient as this function will additionally perform distributed synchronization of modified data.
map_reduce_edges() provide similar signalling functionality, but should not make modifications to graph data. graphlab::distributed_graph::transform_edges() provide the same graph modification capabilities, but without a context and thus cannot perform signalling.
EdgeMapperType | The type of the map function. Not generally needed. Can be inferred by the compiler. |
mapfunction | The map function to use. Must take an icontext_type& as its first argument, and a edge_type, or a reference to a edge_type as its second argument. Returns void. |
Definition at line 1132 of file iengine.hpp.
|
inlineinherited |
Performs a transformation operation on each vertex in the graph.
Given a mapfunction, transform_vertices() calls mapfunction on every vertex in graph. The map function may make modifications to the data on the vertex. transform_vertices() must be called by all machines simultaneously.
For instance, if the graph has integer vertex data, and integer edge data:
To set each vertex value to be the number of out-going edges, we may write the following function:
Calling transform_vertices():
will run the set_vertex_value()
function on each vertex in the graph, setting its new value.
Since the mapfunction is provided with a context, the mapfunction can also be used to perform signalling. For instance, the set_vertex_value
function above may be modified to set the value of the vertex, but to also signal the vertex if it has more than 5 outgoing edges.
However, if the purpose of the function is to only signal without making modifications, map_reduce_vertices() will be more efficient as this function will additionally perform distributed synchronization of modified data.
map_reduce_vertices() provide similar signalling functionality, but should not make modifications to graph data. graphlab::distributed_graph::transform_vertices() provide the same graph modification capabilities, but without a context and thus cannot perform signalling.
VertexMapperType | The type of the map function. Not generally needed. Can be inferred by the compiler. |
mapfunction | The map function to use. Must take an icontext_type& as its first argument, and a vertex_type, or a reference to a vertex_type as its second argument. Returns void. |
Definition at line 1055 of file iengine.hpp.