Public Member Functions | |
| def | __init__ (self, db_prefix, db_type, node_manager_class=CheckpointManager) |
| def | init (self, nodes, retrieve_from_epoch=None) |
| def | load (self, epoch) |
| def | load_blobs_locally (self, nodes, blob_names, epoch, session) |
| def | save (self, epoch) |
Coordinates checkpointing and checkpointing across multiple nodes. Each of `init`, `load` and `save` will build TaskGroups which will trigger checkpointing on each of the nodes involved in a distributed job.
Definition at line 221 of file checkpoint.py.
| def checkpoint.MultiNodeCheckpointManager.load_blobs_locally | ( | self, | |
| nodes, | |||
| blob_names, | |||
| epoch, | |||
| session | |||
| ) |
Loads the necessary blobs from the checkpoints to the current node.
Args:
blob_names: A list of strings. Each string is the name of a
blob.
epoch: An integer. The checkpoint epoch to load from.
session: A Session object to execute the Load ops.
Definition at line 261 of file checkpoint.py.
1.8.14