The synchronous engine executes all active vertex program synchronously in a sequence of super-step (iterations) in both the shared and distributed memory settings. More...

#include <graphlab/engine/synchronous_engine.hpp>

Public Types
typedef VertexProgram	vertex_program_type
	The user defined vertex program type. Equivalent to the VertexProgram template argument.
typedef VertexProgram::gather_type	gather_type
	The user defined type returned by the gather function.
typedef VertexProgram::message_type	message_type
	The user defined message type used to signal neighboring vertex programs.
typedef VertexProgram::vertex_data_type	vertex_data_type
	The type of data associated with each vertex in the graph.
typedef VertexProgram::edge_data_type	edge_data_type
	The type of data associated with each edge in the graph.
typedef VertexProgram::graph_type	graph_type
	The type of graph supported by this vertex program.
typedef graph_type::vertex_type	vertex_type
	The type used to represent a vertex in the graph. See graphlab::distributed_graph::vertex_type for details.
typedef graph_type::edge_type	edge_type
	The type used to represent an edge in the graph. See graphlab::distributed_graph::edge_type for details.
typedef icontext< graph_type, gather_type, message_type >	icontext_type
	The type of the callback interface passed by the engine to vertex programs. See graphlab::icontext for details.
typedef graph_type::vertex_id_type	vertex_id_type
	The vertex identifier type defined in graphlab::vertex_id_type.

Public Member Functions
	synchronous_engine (distributed_control &dc, graph_type &graph, const graphlab_options &opts=graphlab_options())
	Construct a synchronous engine for a given graph and options.
execution_status::status_enum	start ()
	Start execution of the synchronous engine.
size_t	num_updates () const
	Compute the total number of updates (calls to apply) executed since start was last invoked.
void	signal (vertex_id_type vid, const message_type &message=message_type())
void	signal_all (const message_type &message=message_type(), const std::string &order="shuffle")
void	signal_vset (const vertex_set &vset, const message_type &message=message_type(), const std::string &order="shuffle")
float	elapsed_seconds () const
	Get the elapsed time in seconds since start was last called.
int	iteration () const
	Get the current iteration number since start was last invoked.
size_t	total_memory_usage () const
	Compute the total memory used by the entire distributed system.
aggregator_type *	get_aggregator ()
	Get a pointer to the distributed aggregator object.
virtual void	signal (vertex_id_type vertex, const message_type &message=message_type())=0
	Signals single a vertex with an optional message.
virtual void	signal_all (const message_type &message=message_type(), const std::string &order="shuffle")=0
	Signal all vertices with a particular message.
virtual void	signal_vset (const vertex_set &vset, const message_type &message=message_type(), const std::string &order="shuffle")=0
	Signal a set of vertices with a particular message.
template<typename ReductionType , typename VertexMapType , typename FinalizerType >
bool	add_vertex_aggregator (const std::string &key, VertexMapType map_function, FinalizerType finalize_function)
	Creates a vertex aggregator. Returns true on success. Returns false if an aggregator of the same name already exists.
template<typename ReductionType , typename EdgeMapType , typename FinalizerType >
bool	add_edge_aggregator (const std::string &key, EdgeMapType map_function, FinalizerType finalize_function)
	Creates an edge aggregator. Returns true on success. Returns false if an aggregator of the same name already exists.
bool	aggregate_now (const std::string &key)
	Performs an immediate aggregation on a key.
template<typename ReductionType , typename VertexMapperType >
ReductionType	map_reduce_vertices (VertexMapperType mapfunction)
	Performs a map-reduce operation on each vertex in the graph returning the result.
template<typename ReductionType , typename EdgeMapperType >
ReductionType	map_reduce_edges (EdgeMapperType mapfunction)
	Performs a map-reduce operation on each edge in the graph returning the result.
template<typename VertexMapperType >
void	transform_vertices (VertexMapperType mapfunction)
	Performs a transformation operation on each vertex in the graph.
template<typename EdgeMapperType >
void	transform_edges (EdgeMapperType mapfunction)
	Performs a transformation operation on each edge in the graph.
bool	aggregate_periodic (const std::string &key, float seconds)
	Requests that a particular aggregation key be recomputed periodically when the engine is running.

Detailed Description

template<typename VertexProgram>
class graphlab::synchronous_engine< VertexProgram >

The synchronous engine executes all active vertex program synchronously in a sequence of super-step (iterations) in both the shared and distributed memory settings.

Template Parameters:

VertexProgram The user defined vertex program which should implement the graphlab::ivertex_program interface.

Execution Semantics

On start() the graphlab::ivertex_program::init function is invoked on all vertex programs in parallel to initialize the vertex program, vertex data, and possibly signal vertices. The engine then proceeds to execute a sequence of super-steps (iterations) each of which is further decomposed into a sequence of minor-steps which are also executed synchronously:

Receive all incoming messages (signals) by invoking the graphlab::ivertex_program::init function on all vertex-programs that have incoming messages. If a vertex-program does not have any incoming messages then it is not active during this super-step.
Execute all gathers for active vertex programs by invoking the user defined graphlab::ivertex_program::gather function on the edge direction returned by the graphlab::ivertex_program::gather_edges function. The gather functions can modify edge data but cannot modify the vertex program or vertex data and therefore can be executed on multiple edges in parallel. The gather type is used to accumulate (sum) the result of the gather function calls.
Execute all apply functions for active vertex-programs by invoking the user defined graphlab::ivertex_program::apply function passing the sum of the gather functions. If graphlab::ivertex_program::gather_edges returns no edges then the default gather value is passed to apply. The apply function can modify the vertex program and vertex data.
Execute all scatters for active vertex programs by invoking the user defined graphlab::ivertex_program::scatter function on the edge direction returned by the graphlab::ivertex_program::scatter_edges function. The scatter functions can modify edge data but cannot modify the vertex program or vertex data and therefore can be executed on multiple edges in parallel.

Construction

The synchronous engine is constructed by passing in a graphlab::distributed_control object which manages coordination between engine threads and a graphlab::distributed_graph object which is the graph on which the engine should be run. The graph should already be populated and cannot change after the engine is constructed. In the distributed setting all program instances (running on each machine) should construct an instance of the engine at the same time.

Computation is initiated by signaling vertices using either graphlab::synchronous_engine::signal or graphlab::synchronous_engine::signal_all. In either case all machines should invoke signal or signal all at the same time. Finally, computation is initiated by calling the graphlab::synchronous_engine::start function.

Example Usage

The following is a simple example demonstrating how to use the engine:

     #include <graphlab.hpp>
    
     struct vertex_data {
       // code
     };
     struct edge_data {
       // code
     };
     typedef graphlab::distributed_graph<vertex_data, edge_data> graph_type;
     typedef float gather_type;
     struct pagerank_vprog :
       public graphlab::ivertex_program<graph_type, gather_type> {
       // code
     };
    
     int main(int argc, char** argv) {
       // Initialize control plain using mpi
       graphlab::mpi_tools::init(argc, argv);
       graphlab::distributed_control dc;
       // Parse command line options
       graphlab::command_line_options clopts("PageRank algorithm.");
       std::string graph_dir;
       clopts.attach_option("graph", &graph_dir, graph_dir,
                            "The graph file.");
       if(!clopts.parse(argc, argv)) {
         std::cout << "Error in parsing arguments." << std::endl;
         return EXIT_FAILURE;
       }
       graph_type graph(dc, clopts);
       graph.load_structure(graph_dir, "tsv");
       graph.finalize();
       std::cout << "#vertices: " << graph.num_vertices()
                 << " #edges:" << graph.num_edges() << std::endl;
       graphlab::synchronous_engine<pagerank_vprog> engine(dc, graph, clopts);
       engine.signal_all();
       engine.start();
       std::cout << "Runtime: " << engine.elapsed_time();
       graphlab::mpi_tools::finalize();
     }

Engine Options

The synchronous engine supports several engine options which can be set as command line arguments using –engine_opts :

max_iterations: (default: infinity) The maximum number of iterations (super-steps) to run.

timeout: (default: infinity) The maximum time in seconds that the engine may run. When the time runs out the current iteration is completed and then the engine terminates.

use_cache: (default: false) This is used to enable caching. When caching is enabled the gather phase is skipped for vertices that already have a cached value. To use caching the vertex program must either clear (icontext::clear_gather_cache) or update (icontext::post_delta) the cache values of neighboring vertices during the scatter phase.

snapshot_interval If set to a positive value, a snapshot is taken every this number of iterations. If set to 0, a snapshot is taken before the first iteration. If set to a negative value, no snapshots are taken. Defaults to -1. A snapshot is a binary dump of the graph.

snapshot_path If snapshot_interval is set to a value >=0, this option must be specified and should contain a target basename for the snapshot. The path including folder and file prefix in which the snapshots should be saved.

See also:: graphlab::omni_engine; graphlab::async_consistent_engine; graphlab::semi_synchronous_engine

Definition at line 207 of file synchronous_engine.hpp.

Member Typedef Documentation

template<typename VertexProgram>

typedef VertexProgram::edge_data_type graphlab::synchronous_engine< VertexProgram >::edge_data_type

The type of data associated with each edge in the graph.

The edge data type must be Serializable.

Definition at line 256 of file synchronous_engine.hpp.

template<typename VertexProgram>

typedef graph_type::edge_type graphlab::synchronous_engine< VertexProgram >::edge_type

The type used to represent an edge in the graph. See graphlab::distributed_graph::edge_type for details.

The edge type contains the function graphlab::distributed_graph::edge_type::data which returns a reference to the edge data. In addition the edge type contains the function graphlab::distributed_graph::edge_type::source and graphlab::distributed_graph::edge_type::target.

Reimplemented from graphlab::iengine< VertexProgram >.

Definition at line 289 of file synchronous_engine.hpp.

template<typename VertexProgram>

typedef VertexProgram::gather_type graphlab::synchronous_engine< VertexProgram >::gather_type

The user defined type returned by the gather function.

The gather type is defined in the graphlab::ivertex_program interface and is the value returned by the graphlab::ivertex_program::gather function. The gather type must have an operator+=(const gather_type& other) function and must be Serializable.

Definition at line 229 of file synchronous_engine.hpp.

template<typename VertexProgram>

typedef VertexProgram::graph_type graphlab::synchronous_engine< VertexProgram >::graph_type

The type of graph supported by this vertex program.

See graphlab::distributed_graph

Reimplemented from graphlab::iengine< VertexProgram >.

Definition at line 263 of file synchronous_engine.hpp.

template<typename VertexProgram>

typedef icontext<graph_type, gather_type, message_type> graphlab::synchronous_engine< VertexProgram >::icontext_type

The type of the callback interface passed by the engine to vertex programs. See graphlab::icontext for details.

The context callback is passed to the vertex program functions and is used to signal other vertices, get the current iteration, and access information about the engine.

Reimplemented from graphlab::iengine< VertexProgram >.

Definition at line 299 of file synchronous_engine.hpp.

template<typename VertexProgram>

typedef VertexProgram::message_type graphlab::synchronous_engine< VertexProgram >::message_type

The user defined message type used to signal neighboring vertex programs.

The message type is defined in the graphlab::ivertex_program interface and used in the call to graphlab::icontext::signal. The message type must have an operator+=(const gather_type& other) function and must be Serializable.

Reimplemented from graphlab::iengine< VertexProgram >.

Definition at line 242 of file synchronous_engine.hpp.

template<typename VertexProgram>

typedef VertexProgram::vertex_data_type graphlab::synchronous_engine< VertexProgram >::vertex_data_type

The type of data associated with each vertex in the graph.

The vertex data type must be Serializable.

Definition at line 249 of file synchronous_engine.hpp.

template<typename VertexProgram>

typedef VertexProgram graphlab::synchronous_engine< VertexProgram >::vertex_program_type

The user defined vertex program type. Equivalent to the VertexProgram template argument.

The user defined vertex program type which should implement the graphlab::ivertex_program interface.

Reimplemented from graphlab::iengine< VertexProgram >.

Definition at line 218 of file synchronous_engine.hpp.

template<typename VertexProgram>

typedef graph_type::vertex_type graphlab::synchronous_engine< VertexProgram >::vertex_type

The type used to represent a vertex in the graph. See graphlab::distributed_graph::vertex_type for details.

The vertex type contains the function graphlab::distributed_graph::vertex_type::data which returns a reference to the vertex data as well as other functions like graphlab::distributed_graph::vertex_type::num_in_edges which returns the number of in edges.

Reimplemented from graphlab::iengine< VertexProgram >.

Definition at line 276 of file synchronous_engine.hpp.

Constructor & Destructor Documentation

template<typename VertexProgram >

graphlab::synchronous_engine< VertexProgram >::synchronous_engine	(	distributed_control &	dc,
		graph_type &	graph,
		const graphlab_options &	opts = `graphlab_options()`
	)

Construct a synchronous engine for a given graph and options.

The synchronous engine should be constructed after the graph has been loaded (e.g., graphlab::distributed_graph::load) and the graphlab options have been set (e.g., graphlab::command_line_options).

In the distributed engine the synchronous engine must be called on all machines at the same time (in the same order) passing the graphlab::distributed_control object. Upon construction the synchronous engine allocates several data-structures to store messages, gather accumulants, and vertex programs and therefore may require considerable memory.

The number of threads to create are read from opts.get_ncpus().

See the main class documentation for details on the available options.

Parameters:

[in]	dc	Distributed controller to associate with
[in,out]	graph	A reference to the graph object that this engine will modify. The graph must be fully constructed and finalized.
[in]	opts	A graphlab::graphlab_options object specifying engine parameters. This is typically constructed using graphlab::command_line_options.

Constructs an synchronous distributed engine. The number of threads to create are read from opts::get_ncpus().

Valid engine options (graphlab_options::get_engine_args()):

max_iterations Sets the maximum number of iterations the engine will run for.
use_cache If set to true, partial gathers are cached. See gather_caching to understand the behavior of the gather caching model and how it may be used to accelerate program performance.

Parameters:

dc	Distributed controller to associate with
graph	The graph to schedule over. The graph must be fully constructed and finalized.
opts	A graphlab_options object containing options and parameters for the engine.

Definition at line 973 of file synchronous_engine.hpp.

Member Function Documentation

template<typename VertexProgram>

template<typename ReductionType , typename EdgeMapType , typename FinalizerType >

bool graphlab::iengine< VertexProgram >::add_edge_aggregator	(	const std::string &	key,
		EdgeMapType	map_function,
		FinalizerType	finalize_function
	)

inlineinherited

Creates an edge aggregator. Returns true on success. Returns false if an aggregator of the same name already exists.

Creates a edge aggregator associated to a particular key. The map_function is called over every edge in the graph, and the return value of the map is summed. The finalize_function is then called on the result of the reduction. The finalize_function is called on all machines. The map_function should only read the graph data, and should not make any modifications.

Basic Usage

For instance, if the graph has float vertex data, and float edge data:

typedef graphlab::distributed_graph<float, float> graph_type;

An aggregator can be constructed to compute the absolute sum of all the edge data. To do this, we define two functions.

       float absolute_edge_data(engine_type::icontext_type& context,
                                graph_type::edge_type edge) {
         return std::fabs(edge.data());
       }
      
       void print_finalize(engine_type::icontext_type& context, float total) {
         std::cout << total << "\n";
       }

Next, we define the aggregator in the engine by calling add_edge_aggregator(). We must assign it a unique name which will be used to reference this particular aggregate operation. We shall call it "absolute_edge_sum".

       engine.add_edge_aggregator<float>("absolute_edge_sum",
                                           absolute_edge_data, 
                                           print_finalize);

When executed, the engine execute absolute_edge_data() on each edge in the graph. absolute_edge_data() reads the edge data, and returns its absolute value. All return values are then summing them together using the float's += operator. The final result is than passed to the print_finalize function. The template argument <float> is necessary to provide information about the return type of absolute_edge_data.

This aggregator can be run immediately by calling aggregate_now() with the name of the aggregator.

engine.aggregate_now("absolute_edge_sum");

Or can be arranged to run periodically together with the engine execution (in this example, every 1.5 seconds).

engine.aggregate_periodic("absolute_edge_sum", 1.5);

Note that since finalize is called on all machines, multiple copies of the total will be printed. If only one copy is desired, see context.cout() or to get the actual process ID using context.procid()

Details

The add_edge_aggregator() function is also templatized over both function types and there is no strong enforcement of the exact argument types of the map function and the reduce function. For instance, in the above example, the following print_finalize() variants may also be accepted.

       void print_finalize(engine_type::icontext_type& context, double total) {
         std::cout << total << "\n";
       }
      
       void print_finalize(engine_type::icontext_type& context, float& total) {
         std::cout << total << "\n";
       }
      
       void print_finalize(engine_type::icontext_type& context, const float& total) {
         std::cout << total << "\n";
       }

In particlar, the last variation may be useful for performance reasons if the reduction type is large.

Distributed Behavior

To obtain consistent distributed behavior in the distributed setting, we designed the aggregator to minimize the amount of asymmetry among the machines. In particular, the finalize operation is guaranteed to be called on all machines. This therefore permits global variables to be modified on finalize since all machines are ensured to be eventually consistent.

For instance, in the above example, print_finalize could store the result in a global variable:

       void print_finalize(engine_type::icontext_type& context, float total) {
         GLOBAL_TOTAL = total;
       }

which will make it accessible to all other running update functions.

Template Parameters:

ReductionType	The output of the map function. Must have operator+= defined, and must be Serializable.
EdgeMapperType	The type of the map function. Not generally needed. Can be inferred by the compiler.
FinalizerType	The type of the finalize function. Not generally needed. Can be inferred by the compiler.

Parameters:

[in]	key	The name of this aggregator. Must be unique.
[in]	map_function	The Map function to use. Must take an icontext_type& as its first argument, and a edge_type, or a reference to a edge_type as its second argument. Returns a ReductionType which must be summable and Serializable .
[in]	finalize_function	The Finalize function to use. Must take an icontext_type& as its first argument and a ReductionType, or a reference to a ReductionType as its second argument.

Definition at line 692 of file iengine.hpp.

template<typename VertexProgram>

template<typename ReductionType , typename VertexMapType , typename FinalizerType >

bool graphlab::iengine< VertexProgram >::add_vertex_aggregator	(	const std::string &	key,
		VertexMapType	map_function,
		FinalizerType	finalize_function
	)

inlineinherited

Creates a vertex aggregator. Returns true on success. Returns false if an aggregator of the same name already exists.

Creates a vertex aggregator associated to a particular key. The map_function is called over every vertex in the graph, and the return value of the map is summed. The finalize_function is then called on the result of the reduction. The finalize_function is called on all machines. The map_function should only read the graph data, and should not make any modifications.

Basic Usage

For instance, if the graph has float vertex data, and float edge data:

typedef graphlab::distributed_graph<float, float> graph_type;

An aggregator can be constructed to compute the absolute sum of all the vertex data. To do this, we define two functions.

       float absolute_vertex_data(engine_type::icontext_type& context,
                                  graph_type::vertex_type vertex) {
         return std::fabs(vertex.data());
       }
      
       void print_finalize(engine_type::icontext_type& context, 
                           float total) {
         std::cout << total << "\n";
       }

Next, we define the aggregator in the engine by calling add_vertex_aggregator(). We must assign it a unique name which will be used to reference this particular aggregate operation. We shall call it "absolute_vertex_sum".

       engine.add_vertex_aggregator<float>("absolute_vertex_sum",
                                           absolute_vertex_data, 
                                           print_finalize);

When executed, the engine execute absolute_vertex_data() on each vertex in the graph. absolute_vertex_data() reads the vertex data, and returns its absolute value. All return values are then summing them together using the float's += operator. The final result is than passed to the print_finalize function. The template argument <float> is necessary to provide information about the return type of absolute_vertex_data.

This aggregator can be run immediately by calling aggregate_now() with the name of the aggregator.

engine.aggregate_now("absolute_vertex_sum");

Or can be arranged to run periodically together with the engine execution (in this example, every 1.5 seconds).

engine.aggregate_periodic("absolute_vertex_sum", 1.5);

Note that since finalize is called on all machines, multiple copies of the total will be printed. If only one copy is desired, see context.cout() or to get the actual process ID using context.procid()

In practice, the reduction type can be any arbitrary user-defined type as long as a += operator is defined. This permits great flexibility in the type of operations the aggregator can perform.

Details

The add_vertex_aggregator() function is also templatized over both function types and there is no strong enforcement of the exact argument types of the map function and the reduce function. For instance, in the above example, the following print_finalize() variants may also be accepted.

       void print_finalize(engine_type::icontext_type& context, double total) {
         std::cout << total << "\n";
       }
      
       void print_finalize(engine_type::icontext_type& context, float& total) {
         std::cout << total << "\n";
       }
      
       void print_finalize(engine_type::icontext_type& context, const float& total) {
         std::cout << total << "\n";
       }

In particlar, the last variation may be useful for performance reasons if the reduction type is large.

Distributed Behavior

To obtain consistent distributed behavior in the distributed setting, we designed the aggregator to minimize the amount of asymmetry among the machines. In particular, the finalize operation is guaranteed to be called on all machines. This therefore permits global variables to be modified on finalize since all machines are ensured to be eventually consistent.

For instance, in the above example, print_finalize could store the result in a global variable:

       void print_finalize(engine_type::icontext_type& context, float total) {
         GLOBAL_TOTAL = total;
       }

which will make it accessible to all other running update functions.

Template Parameters:

ReductionType	The output of the map function. Must have operator+= defined, and must be Serializable.
VertexMapperType	The type of the map function. Not generally needed. Can be inferred by the compiler.
FinalizerType	The type of the finalize function. Not generally needed. Can be inferred by the compiler.

Parameters:

[in]	map_function	The Map function to use. Must take an
[in]	key	The name of this aggregator. Must be unique. icontext_type& as its first argument, and a vertex_type, or a reference to a vertex_type as its second argument. Returns a ReductionType which must be summable and Serializable .
[in]	finalize_function	The Finalize function to use. Must take an icontext_type& as its first argument and a ReductionType, or a reference to a ReductionType as its second argument.

Definition at line 484 of file iengine.hpp.

template<typename VertexProgram>

bool graphlab::iengine< VertexProgram >::aggregate_now ( const std::string & key )

inlineinherited

Performs an immediate aggregation on a key.

Performs an immediate aggregation on a key. All machines must call this simultaneously. If the key is not found, false is returned. Otherwise returns true on success.

For instance, the following code will run the aggregator with the name "absolute_vertex_sum" immediately.

engine.aggregate_now("absolute_vertex_sum");

Parameters:

[in] key Key to aggregate now. Must be a key previously created by add_vertex_aggregator() or add_edge_aggregator().

Returns:: False if key not found, True on success.

Definition at line 780 of file iengine.hpp.

template<typename VertexProgram>

bool graphlab::iengine< VertexProgram >::aggregate_periodic	(	const std::string &	key,
		float	seconds
	)

inlineinherited

Requests that a particular aggregation key be recomputed periodically when the engine is running.

Requests that the aggregator with a given key be aggregated every certain number of seconds when the engine is running. Note that the period is prescriptive: in practice the actual period will be larger than the requested period. Seconds must be >= 0;

For instance, the following code will schedule the aggregator with the name "absolute_vertex_sum" to run every 1.5 seconds.

engine.aggregate_periodic("absolute_vertex_sum", 1.5);

Parameters:

[in]	key	Key to schedule. Must be a key previously created by add_vertex_aggregator() or add_edge_aggregator().
[in]	seconds	How frequently to schedule. Must be >= 0. seconds == 0 will ensure that this key is continously recomputed.

All machines must call simultaneously.

Returns:: Returns true if key is found and seconds >= 0, and false otherwise.

Definition at line 1169 of file iengine.hpp.

template<typename VertexProgram >

float graphlab::synchronous_engine< VertexProgram >::elapsed_seconds ( ) const

virtual

Get the elapsed time in seconds since start was last called.

Returns:: elapsed time in seconds

Implements graphlab::iengine< VertexProgram >.

Definition at line 1208 of file synchronous_engine.hpp.

template<typename VertexProgram >

synchronous_engine< VertexProgram >::aggregator_type * graphlab::synchronous_engine< VertexProgram >::get_aggregator ( )

Get a pointer to the distributed aggregator object.

This is currently used by the graphlab::iengine interface to implement the calls to aggregation.

Returns:: a pointer to the local aggregator.

Definition at line 1076 of file synchronous_engine.hpp.

template<typename VertexProgram >

int graphlab::synchronous_engine< VertexProgram >::iteration ( ) const

virtual

Get the current iteration number since start was last invoked.

Returns:: the current iteration

Reimplemented from graphlab::iengine< VertexProgram >.

Definition at line 1212 of file synchronous_engine.hpp.

template<typename VertexProgram>

template<typename ReductionType , typename EdgeMapperType >

ReductionType graphlab::iengine< VertexProgram >::map_reduce_edges ( EdgeMapperType mapfunction )

inlineinherited

Performs a map-reduce operation on each edge in the graph returning the result.

Given a map function, map_reduce_edges() call the map function on all edges in the graph. The return values are then summed together and the final result returned. The map function should only read data and should not make any modifications. map_reduce_edges() must be called on all machines simultaneously.

Basic Usage

For instance, if the graph has float vertex data, and float edge data:

typedef graphlab::distributed_graph<float, float> graph_type;

To compute an absolute sum over all the edge data, we would write a function which reads in each a edge, and returns the absolute value of the data on the edge.

      float absolute_edge_data(engine_type::icontext_type& context,
                               graph_type::edge_type edge) {
        return std::fabs(edge.data());
      }

After which calling:

float sum = engine.map_reduce_edges<float>(absolute_edge_data);

will call the absolute_edge_data() function on each edge in the graph. absolute_edge_data() reads the value of the edge and returns the absolute result. This return values are then summed together and returned. All machines see the same result.

The template argument <float> is needed to inform the compiler regarding the return type of the mapfunction.

Signalling

Another common use for the map_reduce_edges() function is in signalling. Since the map function is passed a context, it can be used to perform signalling of edges for execution during a later engine.start() call.

For instance, the following code will signal the source vertex of each edge.

      graphlab::empty signal_source(engine_type::icontext_type& context,
                                    graph_type::edge_type edge) {
        context.signal(edge.source());
        return graphlab::empty()
      }

Note that in this case, we are not interested in a reduction operation, and thus we return a graphlab::empty object. Calling:

engine.map_reduce_edges<graphlab::empty>(signal_source);

will run signal_source() on all edges, signalling all source vertices.

Relations

The map function has the same structure as that in add_edge_aggregator() and may be reused in an aggregator. This function is also very similar to graphlab::distributed_graph::map_reduce_edges() with the difference that this takes a context and thus can be used to perform signalling. Finally transform_edges() can be used to perform a similar but may also make modifications to graph data.

Template Parameters:

ReductionType	The output of the map function. Must have operator+= defined, and must be Serializable.
EdgeMapperType	The type of the map function. Not generally needed. Can be inferred by the compiler.

Parameters:

mapfunction The map function to use. Must take an icontext_type& as its first argument, and a edge_type, or a reference to a edge_type as its second argument. Returns a ReductionType which must be summable and Serializable .

Definition at line 974 of file iengine.hpp.

template<typename VertexProgram>

template<typename ReductionType , typename VertexMapperType >

ReductionType graphlab::iengine< VertexProgram >::map_reduce_vertices ( VertexMapperType mapfunction )

inlineinherited

Performs a map-reduce operation on each vertex in the graph returning the result.

Given a map function, map_reduce_vertices() call the map function on all vertices in the graph. The return values are then summed together and the final result returned. The map function should only read the vertex data and should not make any modifications. map_reduce_vertices() must be called on all machines simultaneously.

Basic Usage

For instance, if the graph has float vertex data, and float edge data:

typedef graphlab::distributed_graph<float, float> graph_type;

To compute an absolute sum over all the vertex data, we would write a function which reads in each a vertex, and returns the absolute value of the data on the vertex.

      float absolute_vertex_data(engine_type::icontext_type& context,
                                 graph_type::vertex_type vertex) {
        return std::fabs(vertex.data());
      }

After which calling:

float sum = engine.map_reduce_vertices<float>(absolute_vertex_data);

will call the absolute_vertex_data() function on each vertex in the graph. absolute_vertex_data() reads the value of the vertex and returns the absolute result. This return values are then summed together and returned. All machines see the same result.

The template argument <float> is needed to inform the compiler regarding the return type of the mapfunction.

Signalling

Another common use for the map_reduce_vertices() function is in signalling. Since the map function is passed a context, it can be used to perform signalling of vertices for execution during a later engine.start() call.

For instance, the following code will signal all vertices with value >= 1

      graphlab::empty signal_vertices(engine_type::icontext_type& context,
                                      graph_type::vertex_type vertex) {
        if (vertex.data() >= 1) context.signal(vertex);
        return graphlab::empty()
      }

Note that in this case, we are not interested in a reduction operation, and thus we return a graphlab::empty object. Calling:

engine.map_reduce_vertices<graphlab::empty>(signal_vertices);

will run signal_vertices() on all vertices, signalling all vertices with value <= 1

Relations

The map function has the same structure as that in add_vertex_aggregator() and may be reused in an aggregator. This function is also very similar to graphlab::distributed_graph::map_reduce_vertices() with the difference that this takes a context and thus can be used to perform signalling. Finally transform_vertices() can be used to perform a similar but may also make modifications to graph data.

Template Parameters:

ReductionType	The output of the map function. Must have operator+= defined, and must be Serializable.
VertexMapperType	The type of the map function. Not generally needed. Can be inferred by the compiler.

Parameters:

mapfunction The map function to use. Must take an icontext_type& as its first argument, and a vertex_type, or a reference to a vertex_type as its second argument. Returns a ReductionType which must be summable and Serializable .

Definition at line 876 of file iengine.hpp.

template<typename VertexProgram >

size_t graphlab::synchronous_engine< VertexProgram >::num_updates ( ) const

virtual

Compute the total number of updates (calls to apply) executed since start was last invoked.

Returns:: Total number of updates

Implements graphlab::iengine< VertexProgram >.

Definition at line 1204 of file synchronous_engine.hpp.

template<typename VertexProgram>

virtual void graphlab::iengine< VertexProgram >::signal	(	vertex_id_type	vertex,
		const message_type &	message = `message_type()`
	)

pure virtualinherited

Signals single a vertex with an optional message.

This function sends a message to particular vertex which will receive that message on start. The signal function must be invoked on all machines simultaneously. For example:

graphlab::synchronous_engine<vprog> engine(dc, graph, opts);

engine.signal(0); // signal vertex zero

and not:

graphlab::synchronous_engine<vprog> engine(dc, graph, opts);

if(dc.procid() == 0) engine.signal(0); // signal vertex zero

Since signal is executed synchronously on all machines it should only be used to schedule a small set of vertices. The preferred method to signal a large set of vertices (e.g., all vertices that are a certain type) is to use either the vertex program init function or the aggregation framework. For example to signal all vertices that have a particular value one could write:

       struct bipartite_opt : 
         public graphlab::ivertex_program<graph_type, gather_type> {
         // The user defined init function
         void init(icontext_type& context, vertex_type& vertex) {
           // Signal myself if I am a certain type
           if(vertex.data().on_left) context.signal(vertex);
         }
         // other vastly more interesting code
       };

Parameters:

[in]	vid	the vertex id to signal
[in]	message	the message to send to that vertex. The default message is sent if no message is provided. (See ivertex_program::message_type for details about the message_type).

Implemented in graphlab::semi_synchronous_engine< VertexProgram >, and graphlab::omni_engine< VertexProgram >.

template<typename VertexProgram>

virtual void graphlab::iengine< VertexProgram >::signal_all	(	const message_type &	message = `message_type()`,
		const std::string &	order = `"shuffle"`
	)

pure virtualinherited

Signal all vertices with a particular message.

This function sends the same message to all vertices which will receive that message on start. The signal_all function must be invoked on all machines simultaneously. For example:

graphlab::synchronous_engine<vprog> engine(dc, graph, opts);

engine.signal_all(); // signal all vertices

and not:

graphlab::synchronous_engine<vprog> engine(dc, graph, opts);

if(dc.procid() == 0) engine.signal_all(); // signal vertex zero

The signal_all function is the most common way to send messages to the engine. For example in the pagerank application we want all vertices to be active on the first round. Therefore we would write:

       graphlab::synchronous_engine<pagerank> engine(dc, graph, opts);
       engine.signal_all();
       engine.start();

Parameters:

[in] message the message to send to all vertices. The default message is sent if no message is provided (See ivertex_program::message_type for details about the message_type).

Implemented in graphlab::semi_synchronous_engine< VertexProgram >, and graphlab::omni_engine< VertexProgram >.

template<typename VertexProgram>

virtual void graphlab::iengine< VertexProgram >::signal_vset	(	const vertex_set &	vset,
		const message_type &	message = `message_type()`,
		const std::string &	order = `"shuffle"`
	)

pure virtualinherited

Signal a set of vertices with a particular message.

This function sends the same message to a set of vertices which will receive that message on start. The signal_vset function must be invoked on all machines simultaneously. For example:

graphlab::synchronous_engine<vprog> engine(dc, graph, opts);

engine.signal_vset(vset); // signal a subset of vertices

signal_all() is conceptually equivalent to:

engine.signal_vset(graph.complete_set());

Parameters:

[in]	vset	The set of vertices to signal
[in]	message	the message to send to all vertices. The default message is sent if no message is provided (See ivertex_program::message_type for details about the message_type).

Implemented in graphlab::semi_synchronous_engine< VertexProgram >, and graphlab::omni_engine< VertexProgram >.

template<typename VertexProgram >

execution_status::status_enum graphlab::synchronous_engine< VertexProgram >::start ( )

virtual

Start execution of the synchronous engine.

The start function begins computation and does not return until there are no remaining messages or until max_iterations has been reached.

The start() function modifies the data graph through the vertex programs and so upon return the data graph should contain the result of the computation.

Returns:: The reason for termination

Implements graphlab::iengine< VertexProgram >.

Definition at line 1225 of file synchronous_engine.hpp.

template<typename VertexProgram >

size_t graphlab::synchronous_engine< VertexProgram >::total_memory_usage ( ) const

Compute the total memory used by the entire distributed system.

Returns:: The total memory used in bytes.

Definition at line 1217 of file synchronous_engine.hpp.

template<typename VertexProgram>

template<typename EdgeMapperType >

void graphlab::iengine< VertexProgram >::transform_edges ( EdgeMapperType mapfunction )

inlineinherited

Performs a transformation operation on each edge in the graph.

Given a mapfunction, transform_edges() calls mapfunction on every edge in graph. The map function may make modifications to the data on the edge. transform_edges() must be called on all machines simultaneously.

Basic Usage

For instance, if the graph has integer vertex data, and integer edge data:

typedef graphlab::distributed_graph<size_t, size_t> graph_type;

To set each edge value to be the number of out-going edges of the target vertex, we may write the following:

       void set_edge_value(engine_type::icontext_type& context,
                                graph_type::edge_type edge) {
         edge.data() = edge.target().num_out_edges();
       }

Calling transform_edges():

engine.transform_edges(set_edge_value);

will run the set_edge_value() function on each edge in the graph, setting its new value.

Signalling

Since the mapfunction is provided with a context, the mapfunction can also be used to perform signalling. For instance, the set_edge_value function above may be modified to set the value of the edge, but to also signal the target vertex.

       void set_edge_value(engine_type::icontext_type& context,
                                graph_type::edge_type edge) {
         edge.data() = edge.target().num_out_edges();
         context.signal(edge.target());
       }

However, if the purpose of the function is to only signal without making modifications, map_reduce_edges() will be more efficient as this function will additionally perform distributed synchronization of modified data.

Relations

map_reduce_edges() provide similar signalling functionality, but should not make modifications to graph data. graphlab::distributed_graph::transform_edges() provide the same graph modification capabilities, but without a context and thus cannot perform signalling.

Template Parameters:

EdgeMapperType The type of the map function. Not generally needed. Can be inferred by the compiler.

Parameters:

mapfunction The map function to use. Must take an icontext_type& as its first argument, and a edge_type, or a reference to a edge_type as its second argument. Returns void.

Definition at line 1132 of file iengine.hpp.

template<typename VertexProgram>

template<typename VertexMapperType >

void graphlab::iengine< VertexProgram >::transform_vertices ( VertexMapperType mapfunction )

inlineinherited

Performs a transformation operation on each vertex in the graph.

Given a mapfunction, transform_vertices() calls mapfunction on every vertex in graph. The map function may make modifications to the data on the vertex. transform_vertices() must be called by all machines simultaneously.

Basic Usage

For instance, if the graph has integer vertex data, and integer edge data:

typedef graphlab::distributed_graph<size_t, size_t> graph_type;

To set each vertex value to be the number of out-going edges, we may write the following function:

       void set_vertex_value(engine_type::icontext_type& context,
                                graph_type::vertex_type vertex) {
         vertex.data() = vertex.num_out_edges();
       }

Calling transform_vertices():

engine.transform_vertices(set_vertex_value);

will run the set_vertex_value() function on each vertex in the graph, setting its new value.

Signalling

Since the mapfunction is provided with a context, the mapfunction can also be used to perform signalling. For instance, the set_vertex_value function above may be modified to set the value of the vertex, but to also signal the vertex if it has more than 5 outgoing edges.

       void set_vertex_value(engine_type::icontext_type& context,
                                graph_type::vertex_type vertex) {
         vertex.data() = vertex.num_out_edges();
         if (vertex.num_out_edges() > 5) context.signal(vertex);
       }

However, if the purpose of the function is to only signal without making modifications, map_reduce_vertices() will be more efficient as this function will additionally perform distributed synchronization of modified data.

Relations

map_reduce_vertices() provide similar signalling functionality, but should not make modifications to graph data. graphlab::distributed_graph::transform_vertices() provide the same graph modification capabilities, but without a context and thus cannot perform signalling.

Template Parameters:

VertexMapperType The type of the map function. Not generally needed. Can be inferred by the compiler.

Parameters:

mapfunction The map function to use. Must take an icontext_type& as its first argument, and a vertex_type, or a reference to a vertex_type as its second argument. Returns void.

Definition at line 1055 of file iengine.hpp.

The documentation for this class was generated from the following file:

graphlab/engine/synchronous_engine.hpp

Public Types

Public Member Functions

Detailed Description

template<typename VertexProgram> class graphlab::synchronous_engine< VertexProgram >

Execution Semantics

Construction

Example Usage

Engine Options

Member Typedef Documentation

Constructor & Destructor Documentation

Member Function Documentation

Basic Usage

Details

Distributed Behavior

Basic Usage

Details

Distributed Behavior

Basic Usage

Signalling

Relations

Basic Usage

Signalling

Relations

Basic Usage

Signalling

Relations

Basic Usage

Signalling

Relations

template<typename VertexProgram>
class graphlab::synchronous_engine< VertexProgram >