TCP connection is used for asynchronous messaging. When the NodeB can't send/receive asynchronous messages, the other nodes aren't notified about started/finished jobs, so parent jobflow running on NodeA keeps waiting for the event from NodeB. Heart-beat is vital for meaningful load-balancing, the same check-task mentioned above also checks heart-beat from all cluster nodes.
0s network connection between NodeA and NodeB is down
60s NodeA uses the last available NodeB heart-beat
0-40s check-task running on NodeA detects missing heart-beat from NodeB
status of NodeA or NodeB (the one with shorter uptime) is changed to “suspended”
cluster.node.check.checkMinInterval
- periodicity of cluster node checks (40000ms by default)
cluster.node.sendinfo.interval
– periodicity of heart-beat messages (2000ms by default)
cluster.node.sendinfo.min_interval
– the heart-beat may occasionally be sent more often than specified by “cluster.node.sendinfo.interval”, this property specifies minimum interval (500ms by default)
cluster.node.remove.interval
– maximum interval for missing heart-beat (50000ms by default)