GraphLab: Distributed Graph-Parallel API
2.1
|
Saving the graph requires us to implement a graph writer class comprising of two functions: save_vertex()
and save_edge()
.
The save_vertex()
and save_edge()
functions are are respective called on each vertex/edge in the graph. These functions return a string which is then directly written to the output file.
For instance, to save an output file comprising of [webpage] [pagerank]
lines, we may implement the following:
Since we are not interested in the edges, the save_edge()
function simply returns an empty string.
Then to write the graph, we will call
This will save a sequence of files named output_1_of_N, output_2_of_N ...
where N is some integer. Concatenating all the files together will produce the combined output. If the gzip option is set, each of the files will have a .gz suffix and
gunzip
must be used to decompres the file for reading.
If the output path is located on HDFS, for instance:
hdfs:///namenode/data/output
The result will be saved to the HDFS cluster with the given namename, in the subdirectory /data with the filenames output_1_of_N, output_2_of_N ...
.
There are several other "built-in" saving formats which can be accessed through the ref graphlab::distributed_graph::save_format() "graph.save_format()"function.
The next section is a brief conclusion.