GraphLab: Distributed Graph-Parallel API  2.1
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Macros Groups Pages
7: Saving Results

Saving the graph requires us to implement a graph writer class comprising of two functions: save_vertex() and save_edge().

class graph_writer {
public:
std::string save_vertex(graph_type::vertex_type v) { return ""; }
std::string save_edge(graph_type::edge_type e) { return ""; }
};

The save_vertex() and save_edge() functions are are respective called on each vertex/edge in the graph. These functions return a string which is then directly written to the output file.

For instance, to save an output file comprising of [webpage] [pagerank] lines, we may implement the following:

class graph_writer {
public:
std::string save_vertex(graph_type::vertex_type v) {
std::stringstream strm;
// remember the \n at the end! This will provide a line break
// after each page.
strm << v.data().pagename << "\t" << v.data().pagerank << "\n";
return strm.str();
}
std::string save_edge(graph_type::edge_type e) { return ""; }
};

Since we are not interested in the edges, the save_edge() function simply returns an empty string.

Note:
The stringstream is somewhat slow and is not the fastest way to write a string. Performance gains can be made through the use of C string operations.

Then to write the graph, we will call

graph.save("output",
graph_writer(),
false, // set to true if each output file is to be gzipped
true, // whether vertices are saved
false); // whether edges are saved

This will save a sequence of files named output_1_of_N, output_2_of_N ... where N is some integer. Concatenating all the files together will produce the combined output. If the gzip option is set, each of the files will have a .gz suffix and gunzip must be used to decompres the file for reading.

If the output path is located on HDFS, for instance:

hdfs:///namenode/data/output

The result will be saved to the HDFS cluster with the given namename, in the subdirectory /data with the filenames output_1_of_N, output_2_of_N ... .

There are several other "built-in" saving formats which can be accessed through the ref graphlab::distributed_graph::save_format() "graph.save_format()"function.

The next section is a brief conclusion.