Access to the database is vital for running jobs, running scheduler and cooperation with other nodes also touching database is used for detection of dead process. When the JVM process of NodeB is killed, it stops touching the database and the other nodes may detect it.
0s-30s last touch on DB
NodeB or its connection to the database is down
90s NodeA sees the last touch
0-40s check-task running on NodeA detects obsolete touch from NodeB
status of NodeB is changed to “stopped”, jobs running on the NodeB are "solved", which means, that their status is changed to UNKNOWN and event is dispatched among the cluster nodes. Job result is considered as error.
cluster.node.touch.interval
– periodicity of database touch (20000ms by default)
cluster.node.touch.forced_stop.interval
– interval when the other nodes accept last touch (60000ms by default)
cluster.node.check.checkMinInterval
- periodicity of cluster node checks (40000ms by default)
cluster.node.touch.forced_stop.solve_running_jobs.enabled
- not interval, but boolean value, which can switch the "solving" of running jobs mentioned above