Long-Term Network Malfunction May Cause Jobs to Hang on

Jobflow or master execution executing child jobs on another cluster nodes must be notified about status changes of their child jobs. When the asynchronous messaging doesn't work, events from the child jobs aren't delivered, so parent jobs keep running. When the network works again, the child job events may be re-transmitted, so hung parent job may be finished. However the network malfunction may be so long, that the event can't be re-transmitted.

Please see following time-line to consider proper configuration:
The following configuration properties serve to tune time intervals mentioned above: