GraphLab: Distributed Graph-Parallel API  2.1
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Macros Groups Pages
GraphLab Tutorial

In this example, we would implement a simple PageRank application from scratch, demonstrating all the core GraphLab concepts from loading a graph to performing computation and saving the results.

The implementation philosophy of the GraphLab API is to expose an MPI-like SPMD (Single Program Multiple Data) Interface. That is to say, we try to enforce the illusion that all machines are running the same operations in lock-step.

For instance, a GraphLab program in pseudo code may look like:

  main() {
    ...
    Load Graph from file using parsing_function; 
    global variable RESULT = map reduce on graph vertices using map_function;
    transform graph vertices using transform_function;
    ...
    create an asynchronous engine an attach it to the graph;

    engine.start();

    save Graph using saver() object;
  }

In the distributed environment, each of these operations are run in lock step. However, each individual operation may have significant complexity (perhaps even running asynchronously). To support this illusion requires the user to implement a number of external functions / classes. For instance, in the above pseudo-code, the user needs to implement a map_function, a transform_function , etc.

While GraphLab's RPC implementation permits the implementation of much more complex computation/communication interleaving behavior, we discourage it and we encourage users to use the aggregate "SPMD"-like operations as much as possible. Indeed, none of the toolkit applications we implemented require any more than these operations. As we understand the abstraction needs of the community better, we can continue to expand on the scope of these operations.

The tutorial is divided into the following sections: