NodeB is Killed or It Cannot Connect to the Database

Home \| Table of Contents	NodeB is Killed or It Cannot Connect to the Database	CloverETL 4.7.0
Prev	Cluster Reliability in Unreliable Network Environment	Next

Access to the database is vital for running jobs, running scheduler and cooperation with other nodes also touching database is used for detection of dead process. When the JVM process of NodeB is killed, it stops touching the database and the other nodes may detect it.

Time-line describing the scenario:

0s-30s last touch on DB
NodeB or its connection to the database is down
90s NodeA sees the last touch
0-40s check-task running on NodeA detects obsolete touch from NodeB
status of NodeB is changed to “stopped”, jobs running on the NodeB are "solved", which means, that their status is changed to UNKNOWN and event is dispatched among the cluster nodes. Job result is considered as error.

The following configuration properties serve to tune time intervals mentioned above:

cluster.node.touch.interval – periodicity of database touch (20000ms by default)
cluster.node.touch.forced_stop.interval – interval when the other nodes accept last touch (60000ms by default)
cluster.node.check.checkMinInterval - periodicity of cluster node checks (40000ms by default)
cluster.node.touch.forced_stop.solve_running_jobs.enabled - not interval, but boolean value, which can switch the "solving" of running jobs mentioned above

Prev	Up	Next
NodeA Cannot Establish TCP Connection (Port 7800 by Default) to NodeB	Home \| Table of Contents	Auto-Resuming in Unreliable Network