Public Attributes | |
| init_group | |
| epoch_group | |
| exit_group | |
| stop_signals | |
A Job defines three TaskGroups: the `init_group`, the `epoch_group` and the
`exit_group` which will be run by a JobRunner.
The `init_group` will be run only once at startup. Its role is to
initialize globally persistent blobs such as model weights, accumulators
and data file lists.
The `epoch_group` will be run in a loop after init_group. The loop will
exit when any of the stop signals added with `add_stop_signal` is True
at the end of an epoch.
The `exit_group` will be run only once at the very end of the job, when one
of the stopping criterias for `epoch_group` was met. The role of this group
is save the results of training in the end of the job.
Jobs are context-driven, so that Tasks can be added to the active Job
without having to explicitly pass the job object around.
Example of usage:
def build_reader(partitions):
with Job.current().init_group:
reader = HiveReader(init_reader, ..., partitions)
Task(step=init_reader)
with Job.current().epoch_group:
limited_reader = ReaderWithLimit(reader, num_iter=10000)
data_queue = pipe(limited_reader, num_threads=8)
Job.current().add_stop_signal(limited_reader.data_finished())
return data_queue
def build_hogwild_trainer(reader, model):
with Job.current().init_group:
Task(step=model.param_init_net)
with Job.current().epoch_group:
pipe(reader, processor=model, num_threads=8)
with Job.current().exit_group:
Task(step=model.save_model_net)
with Job() as job:
reader = build_reader(partitions)
model = build_model(params)
build_hogwild_trainer(reader, model)
Definition at line 22 of file checkpoint.py.
1.8.14