When HTTP request can't be established between nodes, jobs which are delegated between nodes, or jobs running in parallel on more nodes will fail. The error is visible in the executions history. Each node periodically executes check-task which checks HTTP connection to other nodes. If the problem is detected, one of the nodes is suspended, since they can't cooperate with each other.
0s network connection between NodeA and NodeB is down
0-40s a check-task running on NodeA can't establish HTTP connection to NodeB; check may last for 30s until it times-out; there is no re-try, if connection fails even just once, it's considered as unreliable, so the nodes can't cooperate
status of NodeA or NodeB (the one with shorter uptime) is changed to “suspended”
cluster.node.check.checkMinInterval
- periodicity of cluster node checks (20000ms by default)
cluster.sync.connection.readTimeout
– HTTP connection response timeout (30000ms by default)
cluster.sync.connection.connectTimeout
– establishing HTTP connection timeout (7000ms by default)