GraphLab: Distributed Graph-Parallel API  2.1
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Macros Groups Pages
graphlab::ivertex_program< Graph, GatherType, MessageType > Class Template Reference

The ivertex_program class defines the vertex program interface that all vertex programs should extend and implement. The vertex-program is used to encode the user-define computation in a GraphLab program. More...

#include <graphlab/vertex_program/ivertex_program.hpp>

List of all members.

Public Types

typedef Graph::vertex_data_type vertex_data_type
 The user defined vertex data associated with each vertex in the graph (see distributed_graph::vertex_data_type).
typedef Graph::edge_data_type edge_data_type
 The user defined edge data associated with each edge in the graph.
typedef GatherType gather_type
 The user defined gather type is used to accumulate the results of the gather function during the gather phase and must implement the operator += operation.
typedef MessageType message_type
typedef Graph graph_type
 The graph type associative with this vertex program.
typedef graph_type::vertex_id_type vertex_id_type
 The unique integer id used to reference vertices in the graph.
typedef graph_type::vertex_type vertex_type
 The opaque vertex object type used to get vertex information.
typedef graph_type::edge_type edge_type
 The opaque edge_object type used to access edge information.
typedef graphlab::edge_dir_type edge_dir_type
 The type used to define the direction of edges used in gather and scatter.
typedef icontext< graph_type,
gather_type, message_type
icontext_type
 The context type is used by the vertex program to communicate with the engine.

Public Member Functions

virtual ~ivertex_program ()
 Standard virtual destructor for an abstract class.
virtual void init (icontext_type &context, const vertex_type &vertex, const message_type &msg)
 This called by the engine to receive a message to this vertex program. The vertex program can use this to initialize any state before entering the gather phase. The init function is invoked once per execution of the vertex program.
virtual edge_dir_type gather_edges (icontext_type &context, const vertex_type &vertex) const
 Returns the set of edges on which to run the gather function. The default edge direction is in edges.
virtual gather_type gather (icontext_type &context, const vertex_type &vertex, edge_type &edge) const
 The gather function is called on all the ivertex_program::gather_edges in parallel and returns the gather_type which are added to compute the final output of the gather phase.
virtual void apply (icontext_type &context, vertex_type &vertex, const gather_type &total)=0
 The apply function is called once the gather phase has completed and must be implemented by all vertex programs.
virtual edge_dir_type scatter_edges (icontext_type &context, const vertex_type &vertex) const
 Returns the set of edges on which to run the scatter function. The default edge direction is out edges.
virtual void scatter (icontext_type &context, const vertex_type &vertex, edge_type &edge) const
 Scatter is called on all scatter_edges() in parallel after the apply function has completed and is typically responsible for updating edge data, signaling (messaging) adjacent vertices, and updating the gather cache state when caching is enabled.
virtual void pre_local_gather (gather_type &) const
virtual void post_local_gather (gather_type &) const

Detailed Description

template<typename Graph, typename GatherType, typename MessageType = graphlab::empty>
class graphlab::ivertex_program< Graph, GatherType, MessageType >

The ivertex_program class defines the vertex program interface that all vertex programs should extend and implement. The vertex-program is used to encode the user-define computation in a GraphLab program.

Overview

A vertex program represents the primary user defined computation in GraphLab. A unique instance of the vertex program is run on each vertex in the graph and can interact with neighboring vertex programs through the gather and scatter functions as well as by signaling neighboring vertex-programs. Conceptually the vertex-program is a class which represents the parts of an update-function in the original formulation of the GraphLab abstraction. Moreover many graph-structured programs can be written in the following pattern:

graphlab::update_function(Vertex center, Neighborhood nbrs) {
// nbrs represents the state of neighboring vertices and edges
// Gather Phase:
sum = EMPTY;
for(edge in nbrs.in_edges()) {
// The sum is a general commutative associative operation
if(sum == EMPTY) sum = gather_function(center, edge, edge.neighbor());
else sum += gather_function(center, edge, edge.neighbor());
}
// Apply Phase:
center = apply_function(center, sum);
// Scatter Phase:
for(edge in nbrs.out_edges()) {
edge = scatter_function(center, edge, edge.neighbor());
if(condition is met) trigger_neighbor();
}
}

Vertex programs express computation by implementing what we call the Gather-Apply-Scatter (GAS) model which decomposes the vertex program into a parallel gather phase, followed by an atomic apply phase, and finally a parallel scatter phase. This decomposition allows us to execute a single vertex program on several machines simultaneously and move computation to the data.

We therefore decompose the update function logic into member functions of the vertex-program class that are invoked in the following manner:

For the center vertex vtx:
vprog.init(ctx, vtx, msg);
// Gather Phase:
vprog::gather_type sum = EMPTY;
ParallelFor(adjacent edges in direction vprog.gather_edges(ctx, vtx) )
if(sum == EMPTY) sum = vprog.gather(ctx, vtx, edge);
else sum += vprog.gather(ctx, vtx, edge);
// Apply Phase
vprog.apply(ctx, vtx, sum);
// Scatter Phase
ParallelFor(adjacent edges in direction vprog.scatter_edges(ctx, vtx) )
vprog.scatter(ctx, vtx, edge);
// Vertex program is destroyed
vprog = vertex_program();

All user define vertex programs must extend the ivertex_program interface and implement the ivertex_program::apply function. Most vertex programs will also implement the ivertex_program::gather and ivertex_program::scatter functions.

The state of a vertex program does not persist between invocations of ivertex_program::init. Moreover prior to each call to init the vertex program's previous state is cleared. Therefore any persistent state must be saved into the vertex data.

The vertex program depends on several key types which are template arguments to ivertex_program interface.

  • graph_type: the type of graph used to store the data for this vertex program. This currently always the distributed_graph.
  • gather_type: the type used in the gather phase and must implement the operator+= function.
  • message_type: The type used for signaling and is typically empty. However if a message type is desired it must implement the operator+= to allow message merging across the network. In addition the message type may also implement the priority() function which returns a double assigning a priority to the reception of the message (used by the asynchronous engines). We provide a basic set of simple prioritized messages in graphlab::signals.

All user-defined types including the vertex data, edge data, vertex-program, gather type, and message type must be serializable (see Serializable) and default constructible to enable movement between machines.

Advanced Features

While the basic Gather-Apply-Scatter approach to graph structure computation can express a wide range of algorithms there are some situation where additional features could either simplify the design or provide additional efficiency.

Messaging

Vertex-programs can trigger adjacent vertex programs by sending a signal which can contain a message to neighbor vertices. By default the message type is empty however it is possible for the user to define a message type. For example the following message_type could be used to implement pagerank:

struct pagerank_message : public graphlab::IS_POD_TYPE {
double value;
double priority() const { return std::fabs(value); }
message_type& operator+=(const message_type& other) {
value += other.value;
return *this;
}
};

Unlike other messaging abstractions, GraphLab always merges messages destined to the same vertex. This allows the GraphLab engines to minimize network communication and more evenly balance computation. Messages are combined using the operator+= function.

As mentioned earlier some engines may prioritize the reception of messages. Messages can optionally (it is not required) provide a priority function which is used to prioritize message reception. The engine then attempts to prioritize the reception of higher priority messages first.

The message is received in the ivertex_program::init function. The single message passed into ivertex_program::init represents the sum of all messages destined to that vertex since the vertex-program was last invoked.

The GraphLab messaging framework allows us to write Pregel-like vertex-programs of the form:

typedef graphlab::empty gather_type;
class pregel_pagerank :
public ivertex_program<graph_type, gather_type, pagerank_message>,
// Store a local copy of the message data
double message_value;
// Receive the inbound message (sum of messages)
void init(icontext_type& context, const vertex_type& vertex,
const message_type& msg) {
message_value = message.value;
}
// Skip the gather phase
const vertex_type& vertex) const {
}
// Update the pagerank using the message
void apply(icontext_type& context, vertex_type& vertex,
const gather_type& total) {
vertex.data() += message_value;
}
// Scatter along out edges
const vertex_type& vertex) const {
return OUT_EDGES;
}
// Compute new messages encoding the change in the pagerank of
// adjacent vertices.
void scatter(icontext_type& context, const vertex_type& vertex,
edge_type& edge) const {
pagerank_message msg;
msg.value = message_value * (1 - RESET_PROBABILITY);
context.signal(edge.target(), msg);
}
};

Notice that the gather phase is skipped and instead the gather computation is accomplished using the messages. However unlike Pregel the scatter function which computs and sends the new message is actually run on the machine that is receiving the message.

The message abstraction is surprisingly powerful and can often often express computation that can be written using the Gather operation. However, the message combination is done outside of the consistency model and so can lead to more confusing code.

Gather Caching

In many applications the gather computation can be costly and high-degree vertices will be signaled often even only a small fraction of its neighbors values have changed. In this case running the gather function on all neighbors can be wasteful. To address this important issue the GraphLab engines expose a gather caching mechanism. However to take advantage of the gather caching the vertex-program must notify the engine when a cache is no longer valid and can even correct the cache to ensure that it remains valid.

Definition at line 279 of file ivertex_program.hpp.


Member Typedef Documentation

template<typename Graph, typename GatherType, typename MessageType = graphlab::empty>
typedef Graph::edge_data_type graphlab::ivertex_program< Graph, GatherType, MessageType >::edge_data_type

The user defined edge data associated with each edge in the graph.

The edge data type must be serializable (see Serializable)

Definition at line 317 of file ivertex_program.hpp.

template<typename Graph, typename GatherType, typename MessageType = graphlab::empty>
typedef graphlab::edge_dir_type graphlab::ivertex_program< Graph, GatherType, MessageType >::edge_dir_type

The type used to define the direction of edges used in gather and scatter.

Possible values include:

See graphlab::edge_dir_type for details.

Definition at line 495 of file ivertex_program.hpp.

template<typename Graph, typename GatherType, typename MessageType = graphlab::empty>
typedef graph_type::edge_type graphlab::ivertex_program< Graph, GatherType, MessageType >::edge_type

The opaque edge_object type used to access edge information.

The edge type is defined by the graph. See distributed_graph::edge_type for details.

Definition at line 474 of file ivertex_program.hpp.

template<typename Graph, typename GatherType, typename MessageType = graphlab::empty>
typedef GatherType graphlab::ivertex_program< Graph, GatherType, MessageType >::gather_type

The user defined gather type is used to accumulate the results of the gather function during the gather phase and must implement the operator += operation.

The gather type plays the following role in the vertex program:

gather_type sum = EMPTY;
for(edges in vprog.gather_edges()) {
if(sum == EMPTY) sum = vprog.gather(...);
else sum += vprog.gather( ... );
}
vprog.apply(..., sum);

In addition to implementing the operator+= operation the gather type must also be serializable (see Serializable).

Definition at line 348 of file ivertex_program.hpp.

template<typename Graph, typename GatherType, typename MessageType = graphlab::empty>
typedef Graph graphlab::ivertex_program< Graph, GatherType, MessageType >::graph_type

The graph type associative with this vertex program.

The graph type is specified as a template argument and will usually be distributed_graph.

Definition at line 449 of file ivertex_program.hpp.

template<typename Graph, typename GatherType, typename MessageType = graphlab::empty>
typedef icontext<graph_type, gather_type, message_type> graphlab::ivertex_program< Graph, GatherType, MessageType >::icontext_type

The context type is used by the vertex program to communicate with the engine.

The context and provides facilities for signaling adjacent vertices (sending messages), interacting with the GraphLab gather cache (posting deltas), and accessing engine state.

Definition at line 508 of file ivertex_program.hpp.

template<typename Graph, typename GatherType, typename MessageType = graphlab::empty>
typedef MessageType graphlab::ivertex_program< Graph, GatherType, MessageType >::message_type

The message type which must be provided by the vertex_program

Definition at line 404 of file ivertex_program.hpp.

template<typename Graph, typename GatherType, typename MessageType = graphlab::empty>
typedef Graph::vertex_data_type graphlab::ivertex_program< Graph, GatherType, MessageType >::vertex_data_type

The user defined vertex data associated with each vertex in the graph (see distributed_graph::vertex_data_type).

The vertex data is the data associated with each vertex in the graph. Unlike the vertex-program the vertex data of adjacent vertices is visible to other vertex programs during the gather and scatter phases and persists between executions of the vertex-program.

The vertex data type must be serializable (see Serializable)

Definition at line 296 of file ivertex_program.hpp.

template<typename Graph, typename GatherType, typename MessageType = graphlab::empty>
typedef graph_type::vertex_id_type graphlab::ivertex_program< Graph, GatherType, MessageType >::vertex_id_type

The unique integer id used to reference vertices in the graph.

See graphlab::vertex_id_type for details.

Definition at line 456 of file ivertex_program.hpp.

template<typename Graph, typename GatherType, typename MessageType = graphlab::empty>
typedef graph_type::vertex_type graphlab::ivertex_program< Graph, GatherType, MessageType >::vertex_type

The opaque vertex object type used to get vertex information.

The vertex type is defined by the graph. See distributed_graph::vertex_type for details.

Definition at line 465 of file ivertex_program.hpp.


Member Function Documentation

template<typename Graph, typename GatherType, typename MessageType = graphlab::empty>
virtual void graphlab::ivertex_program< Graph, GatherType, MessageType >::apply ( icontext_type context,
vertex_type vertex,
const gather_type total 
)
pure virtual

The apply function is called once the gather phase has completed and must be implemented by all vertex programs.

The apply function is responsible for modifying the vertex data and is run only once per vertex per execution of a vertex program. In addition the apply function can modify the state of the vertex program.

If a vertex has no neighbors than the apply function is called passing the default value for the gather_type.

Parameters:
[in,out]contextThe context is used to interact with the engine
[in,out]vertexThe vertex on which this vertex-program is running.
[in]totalThe result of the gather phase. If a vertex has no neighbors then the total is the default value (i.e., gather_type()) of the gather type.
template<typename Graph, typename GatherType, typename MessageType = graphlab::empty>
virtual gather_type graphlab::ivertex_program< Graph, GatherType, MessageType >::gather ( icontext_type context,
const vertex_type vertex,
edge_type edge 
) const
inlinevirtual

The gather function is called on all the ivertex_program::gather_edges in parallel and returns the gather_type which are added to compute the final output of the gather phase.

The gather function is the core computational element of the Gather phase and is responsible for collecting the information about the state of adjacent vertices and edges.

Warning:
The gather function is executed in parallel on multiple machines and therefore cannot modify the vertex-program's state or the vertex data.

A default implementation of the gather function is provided which will fail if invoked.

Parameters:
[in,out]contextThe context is used to interact with the engine
[in]vertexThe vertex on which this vertex-program is running. Note that the vertex is constant and its value should not be modified.
[in,out]edgeThe adjacent edge to be processed. The edge is not constant and therefore the edge data can be modified.
Returns:
the result of the gather computation which will be "summed" to produce the input to the apply operation. The behavior of the "sum" is defined by the gather_type.

Definition at line 617 of file ivertex_program.hpp.

template<typename Graph, typename GatherType, typename MessageType = graphlab::empty>
virtual edge_dir_type graphlab::ivertex_program< Graph, GatherType, MessageType >::gather_edges ( icontext_type context,
const vertex_type vertex 
) const
inlinevirtual

Returns the set of edges on which to run the gather function. The default edge direction is in edges.

The gather_edges function is invoked after the init function has completed.

Warning:
The gather_edges function may be invoked multiple times for the same execution of the vertex-program and should return the same value. In addition it cannot modify the vertex-programs state or the vertex data.

Possible return values include:

  • graphlab::NO_EDGES : The gather phase is completely skipped potentially reducing network communication.
  • graphlab::ALL_EDGES : The gather function is run on both inbound and outbound edges to this vertes.
Parameters:
[in,out]contextThe context is used to interact with the engine
[in]vertexThe vertex on which this vertex-program is running. Note that the vertex is constant and its value should not be modified.
Returns:
One of graphlab::NO_EDGES, graphlab::IN_EDGES, graphlab::OUT_EDGES, or graphlab::ALL_EDGES.

Definition at line 578 of file ivertex_program.hpp.

template<typename Graph, typename GatherType, typename MessageType = graphlab::empty>
virtual void graphlab::ivertex_program< Graph, GatherType, MessageType >::init ( icontext_type context,
const vertex_type vertex,
const message_type msg 
)
inlinevirtual

This called by the engine to receive a message to this vertex program. The vertex program can use this to initialize any state before entering the gather phase. The init function is invoked once per execution of the vertex program.

If the vertex program does not implement this function then the default implementation (NOP) is used.

Parameters:
[in,out]contextThe context is used to interact with the engine
[in]vertexThe vertex on which this vertex-program is running. Note that the vertex is constant and its value should not be modified within the init function. If there is some message state that is needed then it must be saved to the vertex-program and not the vertex data.
[in]messageThe sum of all the signal calls to this vertex since it was last run.

Definition at line 536 of file ivertex_program.hpp.

template<typename Graph, typename GatherType, typename MessageType = graphlab::empty>
virtual void graphlab::ivertex_program< Graph, GatherType, MessageType >::scatter ( icontext_type context,
const vertex_type vertex,
edge_type edge 
) const
inlinevirtual

Scatter is called on all scatter_edges() in parallel after the apply function has completed and is typically responsible for updating edge data, signaling (messaging) adjacent vertices, and updating the gather cache state when caching is enabled.

The scatter function is almost identical to the gather function except that nothing is returned.

Warning:
The scatter function is executed in parallel on multiple machines and therefore cannot modify the vertex-program's state or the vertex data.

A default implementation of the gather function is provided which will fail if invoked.

Parameters:
[in,out]contextThe context is used to interact with the engine
[in]vertexThe vertex on which this vertex-program is running. Note that the vertex is constant and its value should not be modified.
[in,out]edgeThe adjacent edge to be processed. The edge is not constant and therefore the edge data can be modified.

Definition at line 723 of file ivertex_program.hpp.

template<typename Graph, typename GatherType, typename MessageType = graphlab::empty>
virtual edge_dir_type graphlab::ivertex_program< Graph, GatherType, MessageType >::scatter_edges ( icontext_type context,
const vertex_type vertex 
) const
inlinevirtual

Returns the set of edges on which to run the scatter function. The default edge direction is out edges.

The scatter_edges function is invoked after the apply function has completed.

Warning:
The scatter_edges function may be invoked multiple times for the same execution of the vertex-program and should return the same value. In addition it cannot modify the vertex-programs state or the vertex data.

Possible return values include:

  • graphlab::NO_EDGES : The scatter phase is completely skipped potentially reducing network communication.
  • graphlab::IN_EDGE : The scatter function is only run on inbound edges to this vertex.
  • graphlab::ALL_EDGES : The scatter function is run on both inbound and outbound edges to this vertes.
Parameters:
[in,out]contextThe context is used to interact with the engine
[in]vertexThe vertex on which this vertex-program is running. Note that the vertex is constant and its value should not be modified.
Returns:
One of graphlab::NO_EDGES, graphlab::IN_EDGES, graphlab::OUT_EDGES, or graphlab::ALL_EDGES.

Definition at line 689 of file ivertex_program.hpp.


The documentation for this class was generated from the following file: