Fault Tolerance
When an actor throws an unexpected exception, a failure, while processing a message or during initialization, the actor will by default be stopped. Note that there is an important distinction between failures and validation errors:
A validation error means that the data of a command sent to an actor is not valid, this should rather be modelled as a part of the actor protocol than make the actor throw exceptions.
A failure is instead something unexpected or outside the control of the actor itself, for example a database connection that broke. Opposite to validation errors, it is seldom useful to model such as parts of the protocol as a sending actor very seldom can do anything useful about it.
For failures it is useful to apply the “let it crash” philosophy: instead of mixing fine grained recovery and correction of internal state that may have become partially invalid because of the failure with the business logic we move that responsibility somewhere else. For many cases the resolution can then be to “crash” the actor, and start a new one, with a fresh state that we know is valid.
In Akka Typed this “somewhere else” is called supervision. Supervision allows you to delaratively describe what should happen when a certain type of exceptions are thrown inside an actor. To use supervision the actual Actor behavior is wrapped using Behaviors.supervise
, for example to restart on IllegalStateExceptions
:
- Scala
-
Behaviors.supervise(behavior) .onFailure[IllegalStateException](SupervisorStrategy.restart)
- Java
-
Behaviors.supervise(behavior) .onFailure(IllegalStateException.class, SupervisorStrategy.restart());
Or to resume, ignore the failure and process the next message, instead:
- Scala
-
Behaviors.supervise(behavior) .onFailure[IllegalStateException](SupervisorStrategy.resume)
- Java
-
Behaviors.supervise(behavior) .onFailure(IllegalStateException.class, SupervisorStrategy.resume());
More complicated restart strategies can be used e.g. to restart no more than 10 times in a 10 second period:
- Scala
-
Behaviors.supervise(behavior) .onFailure[IllegalStateException](SupervisorStrategy.restartWithLimit( maxNrOfRetries = 10, withinTimeRange = 10.seconds ))
- Java
-
Behaviors.supervise(behavior) .onFailure(IllegalStateException.class, SupervisorStrategy.restartWithLimit( 10, FiniteDuration.apply(10, TimeUnit.SECONDS) ));
To handle different exceptions with different strategies calls to supervise
can be nested:
- Scala
-
Behaviors.supervise(Behaviors.supervise(behavior) .onFailure[IllegalStateException](SupervisorStrategy.restart)) .onFailure[IllegalArgumentException](SupervisorStrategy.stop)
- Java
-
Behaviors.supervise(Behaviors.supervise(behavior) .onFailure(IllegalStateException.class, SupervisorStrategy.restart())) .onFailure(IllegalArgumentException.class, SupervisorStrategy.stop());
For a full list of strategies see the public methods on SupervisorStrategy
Bubble failures up through the hierarchy
In some scenarios it may be useful to push the decision about what to do on a failure upwards in the Actor hierarchy and let the parent actor handle what should happen on failures (in untyped Akka Actors this is how it works by default).
For a parent to be notified when a child is terminated it has to watch
the child. If the child was stopped because of a failure this will be included in the Terminated
signal in the failed
field.
If the parent in turn does not handle the Terminated
message it will itself fail with an akka.actor.typed.DeathPactException
. Note that DeathPactException
cannot be supervised.
This means that a hierarchy of actors can have a child failure bubble up making each actor on the way stop but informing the top-most parent that there was a failure and how to deal with it, however, the original exception that caused the failure will only be available to the immediate parent out of the box (this is most often a good thing, not leaking implementation details).
There might be cases when you want the original exception to bubble up the hierarchy, this can be done by handling the Terminated
signal, and rethrowing the exception in each actor.
- Scala
-
sealed trait Message case class Fail(text: String) extends Message val worker = Behaviors.immutable[Message] { (ctx, msg) ⇒ msg match { case Fail(text) ⇒ throw new RuntimeException(text) } } val middleManagementBehavior = Behaviors.setup[Message] { ctx ⇒ ctx.log.info("Middle management starting up") val child = ctx.spawn(worker, "child") ctx.watch(child) // here we don't handle Terminated at all which means that // when the child fails or stops gracefully this actor will // fail with a DeathWatchException Behaviors.immutable[Message] { (ctx, msg) ⇒ child ! msg Behaviors.same } } val bossBehavior = Behaviors.supervise(Behaviors.setup[Message] { ctx ⇒ ctx.log.info("Boss starting up") val middleManagment = ctx.spawn(middleManagementBehavior, "middle-management") ctx.watch(middleManagment) // here we don't handle Terminated at all which means that // when middle management fails with a DeathWatchException // this actor will also fail Behaviors.immutable[Message] { (ctx, msg) ⇒ middleManagment ! msg Behaviors.same } }).onFailure[DeathPactException](SupervisorStrategy.restart) // (spawn comes from the testkit) val boss = spawn(bossBehavior, "upper-management") boss ! Fail("ping")
- Java
-
interface Message {} class Fail implements Message { public final String text; Fail(String text) { this.text = text; } } final Behavior<Message> failingChildBehavior = Behaviors.immutable(Message.class) .onMessage(Fail.class, (ctx, message) -> { throw new RuntimeException(message.text); }) .build(); Behavior<Message> middleManagementBehavior = Behaviors.setup((ctx) -> { ctx.getLog().info("Middle management starting up"); final ActorRef<Message> child = ctx.spawn(failingChildBehavior, "child"); // we want to know when the child terminates, but since we do not handle // the Terminated signal, we will in turn fail on child termination ctx.watch(child); // here we don't handle Terminated at all which means that // when the child fails or stops gracefully this actor will // fail with a DeathWatchException return Behaviors.immutable(Message.class) .onMessage(Message.class, (innerCtx, msg) -> { // just pass messages on to the child child.tell(msg); return Behaviors.same(); }).build(); }); Behavior<Message> bossBehavior = Behaviors.setup((ctx) -> { ctx.getLog().info("Boss starting up"); final ActorRef<Message> middleManagement = ctx.spawn(middleManagementBehavior, "middle-management"); ctx.watch(middleManagement); // here we don't handle Terminated at all which means that // when middle management fails with a DeathWatchException // this actor will also fail return Behaviors.immutable(Message.class) .onMessage(Message.class, (innerCtx, msg) -> { // just pass messages on to the child middleManagement.tell(msg); return Behaviors.same(); }).build(); }); final ActorSystem<Message> system = ActorSystem.create(bossBehavior, "boss"); system.tell(new Fail("boom")); // this will now bubble up all the way to the boss and as that is the user guardian it means // the entire actor system will stop