Caffe2 - Python API
A deep learning, cross platform ML framework
Public Member Functions | Public Attributes | List of all members
checkpoint.JobRunner Class Reference
Inheritance diagram for checkpoint.JobRunner:

Public Member Functions

def __init__ (self, job, checkpoint_manager=None, resume_from_epoch=None)
 
def __call__ (self, client)
 
def load_blobs_from_checkpoints (self, blob_names, epoch, session)
 

Public Attributes

 resume_from_epoch
 
 checkpoint
 
 job
 

Detailed Description

Implement the runtime logic for jobs with checkpointing at the level of
epoch. Can be used to run either single-host or distributed jobs. Job
runner is a callable to be called once from the client, passing a Session
as argument. This call will block until the Job execution is complete.

If a checkpoint_manager is passed, checkpoints will be taken after
initialization and after each epoch execution. If, in addition,
`resume_from_epoch` is an epoch number, the corresponding checkpoint will
be loaded and job execution will continue from the given epoch. In
this case, the job's init_group will not be run.

Refer to checkpoint_test.py for an example.

Definition at line 297 of file checkpoint.py.

Member Function Documentation

◆ load_blobs_from_checkpoints()

def checkpoint.JobRunner.load_blobs_from_checkpoints (   self,
  blob_names,
  epoch,
  session 
)
Loads the necessary blobs from the checkpoints.

Checkpoints store the snapshots of the workspace in each node.
Sometimes we only need to load a subset of the blobs from the
checkpoints. One common scenario is to load only the model blobs from
the checkpoints for evaluation purpose. Given the names of the necessary
blobs, this function goes over all the checkpoints of all the nodes, but
only loads the blobs specified in the blob_names to the current
workspace.

Args:
    blob_names: A list of strings. Each string is the name of a
blob.
    epoch: An integer. The checkpoint epoch to load from.
    session: A Session object to execute the load ops.

Raises:
    ValueError: When the checkpoint manager is invalid.

Definition at line 356 of file checkpoint.py.


The documentation for this class was generated from the following file: