Inheritance diagram for checkpoint.Job:

Public Member Functions
def	__init__ (self, init_group=None, epoch_group=None, exit_group=None, stop_signals=None, nodes_to_checkpoint=None)

def	nodes_to_checkpoint (self)

def	compile (self, session_class)

def	__enter__ (self)

def	__exit__ (self, args)

def	add_stop_signal (self, output)

Public Attributes
	init_group

	epoch_group

	exit_group

	stop_signals

Detailed Description

A Job defines three TaskGroups: the `init_group`, the `epoch_group` and the
`exit_group` which will be run by a JobRunner.

The `init_group` will be run only once at startup. Its role is to
initialize globally persistent blobs such as model weights, accumulators
and data file lists.

The `epoch_group` will be run in a loop after init_group. The loop will
exit when any of the stop signals added with `add_stop_signal` is True
at the end of an epoch.

The `exit_group` will be run only once at the very end of the job, when one
of the stopping criterias for `epoch_group` was met. The role of this group
is save the results of training in the end of the job.

Jobs are context-driven, so that Tasks can be added to the active Job
without having to explicitly pass the job object around.

Example of usage:

def build_reader(partitions):
    with Job.current().init_group:
        reader = HiveReader(init_reader, ..., partitions)
        Task(step=init_reader)
    with Job.current().epoch_group:
        limited_reader = ReaderWithLimit(reader, num_iter=10000)
        data_queue = pipe(limited_reader, num_threads=8)
        Job.current().add_stop_signal(limited_reader.data_finished())
    return data_queue

def build_hogwild_trainer(reader, model):
    with Job.current().init_group:
        Task(step=model.param_init_net)
    with Job.current().epoch_group:
        pipe(reader, processor=model, num_threads=8)
    with Job.current().exit_group:
        Task(step=model.save_model_net)

with Job() as job:
    reader = build_reader(partitions)
    model = build_model(params)
    build_hogwild_trainer(reader, model)

Definition at line 22 of file checkpoint.py.

The documentation for this class was generated from the following file:

caffe2/python/checkpoint.py

Public Member Functions

Public Attributes

Detailed Description

Facebook Open Source