The datastructure which surrounds much of GraphLab's computation capabilities is the distributed_graph. The Distributed Graph is a directed graph datastructure comprising of vertices and directed edges, but with no duplicated edges allowed. i.e. there can be only one edge from vertex A to vertex B, and one edge from vertex B to vertex A. An arbitrary user data type can be associated with each vertex and each edge as long as the data type is Serializable.

Vertex Data

Since we are writing PageRank, we will first we define a struct describing a web page. This will be the contents of the vertex. This struct here holds a name of the webpage, as well as the resultant PageRank. A constructor which assigns a name is provided for later convenience. Observe that we also defined a default constructor as this is required for it to be used in the graph.

  struct web_page {
    std::string pagename;
    double pagerank;
    web_page():pagerank(0.0) { }
    explicit web_page(std::string name):pagename(name),pagerank(0.0){ }
  };

To make this Serializable, we need to define a save and load member function. The save function simply writes the pagename and pagerank fields into the output archive object. The load function performs the reverse. Care should be made to ensure that the save and load functions are symmetric.

  struct web_page {
    std::string pagename;
    double pagerank;
    web_page():pagerank(0.0) { }
    explicit web_page(std::string name):pagename(name),pagerank(0.0){ }
    void save(graphlab::oarchive& oarc) const {
      oarc << pagename << pagerank;
    }
    void load(graphlab::iarchive& iarc) {
      iarc >> pagename >> pagerank;
    }
  };

Edge Data

Since we do not need any information to be stored on the edges of the graph, we will just use the graphlab::empty data type which will ensure that the edge data does not take up any memory.

Defining the Graph

The graphlab::distributed_graph data type takes two template arguments:

VertexData The type of data to be stored on each vertex
EdgeData The type of data to be stored on each edge

For convenience, we define the type of the graph using a typedef:

typedef graphlab::distributed_graph<web_page, graphlab::empty> graph_type;

Putting It Together

At this point, our code looks like this:

  #include <string>
  #include <graphlab.hpp>
  struct web_page {
    std::string pagename;
    double pagerank;
    web_page():pagerank(0.0) { }
    explicit web_page(std::string name):pagename(name),pagerank(0.0){ }
    void save(graphlab::oarchive& oarc) const {
      oarc << pagename << pagerank;
    }
    void load(graphlab::iarchive& iarc) {
      iarc >> pagename >> pagerank;
    }
  };
 
  typedef graphlab::distributed_graph<web_page, graphlab::empty> graph_type;
  int main(int argc, char** argv) {
    graphlab::mpi_tools::init(argc, argv);
    graphlab::distributed_control dc;
 
    dc.cout() << "Hello World!\n";
    graphlab::mpi_tools::finalize();
  }

We have constructed the datatypes required for the graph to operate. In the next section, we will fill out the graph using some synthetic data.