Caffe2 - Python API
A deep learning, cross platform ML framework
Public Member Functions | Public Attributes | List of all members
checkpoint.Job Class Reference
Inheritance diagram for checkpoint.Job:

Public Member Functions

def __init__ (self, init_group=None, epoch_group=None, exit_group=None, stop_signals=None, nodes_to_checkpoint=None)
 
def nodes_to_checkpoint (self)
 
def compile (self, session_class)
 
def __enter__ (self)
 
def __exit__ (self, args)
 
def add_stop_signal (self, output)
 

Public Attributes

 init_group
 
 epoch_group
 
 exit_group
 
 stop_signals
 

Detailed Description

A Job defines three TaskGroups: the `init_group`, the `epoch_group` and the
`exit_group` which will be run by a JobRunner.

The `init_group` will be run only once at startup. Its role is to
initialize globally persistent blobs such as model weights, accumulators
and data file lists.

The `epoch_group` will be run in a loop after init_group. The loop will
exit when any of the stop signals added with `add_stop_signal` is True
at the end of an epoch.

The `exit_group` will be run only once at the very end of the job, when one
of the stopping criterias for `epoch_group` was met. The role of this group
is save the results of training in the end of the job.

Jobs are context-driven, so that Tasks can be added to the active Job
without having to explicitly pass the job object around.

Example of usage:

def build_reader(partitions):
    with Job.current().init_group:
        reader = HiveReader(init_reader, ..., partitions)
        Task(step=init_reader)
    with Job.current().epoch_group:
        limited_reader = ReaderWithLimit(reader, num_iter=10000)
        data_queue = pipe(limited_reader, num_threads=8)
        Job.current().add_stop_signal(limited_reader.data_finished())
    return data_queue

def build_hogwild_trainer(reader, model):
    with Job.current().init_group:
        Task(step=model.param_init_net)
    with Job.current().epoch_group:
        pipe(reader, processor=model, num_threads=8)
    with Job.current().exit_group:
        Task(step=model.save_model_net)

with Job() as job:
    reader = build_reader(partitions)
    model = build_model(params)
    build_hogwild_trainer(reader, model)

Definition at line 22 of file checkpoint.py.


The documentation for this class was generated from the following file: