Public Member Functions | |
def | __init__ (self, db_prefix, db_type, node_manager_class=CheckpointManager) |
def | init (self, nodes, retrieve_from_epoch=None) |
def | load (self, epoch) |
def | load_blobs_locally (self, nodes, blob_names, epoch, session) |
def | save (self, epoch) |
Coordinates checkpointing and checkpointing across multiple nodes. Each of `init`, `load` and `save` will build TaskGroups which will trigger checkpointing on each of the nodes involved in a distributed job.
Definition at line 221 of file checkpoint.py.
def checkpoint.MultiNodeCheckpointManager.load_blobs_locally | ( | self, | |
nodes, | |||
blob_names, | |||
epoch, | |||
session | |||
) |
Loads the necessary blobs from the checkpoints to the current node. Args: blob_names: A list of strings. Each string is the name of a blob. epoch: An integer. The checkpoint epoch to load from. session: A Session object to execute the Load ops.
Definition at line 261 of file checkpoint.py.