Circuit Breaker
Circuit breaker is a design pattern used in software development. It is used to detect failures and encapsulates the logic of preventing a failure from constantly recurring, during maintenance, temporary external system failure or unexpected system difficulties.
Generally Circuit Breaker can be used to check the availability of an external service. An external service can be a database server or a web service used by the application.
Circuit breaker detects failures and prevents the application from trying to perform the action that is doomed to fail (until it’s safe to retry).
Use Case
- User places orders, these orders are received and sent to a queue
- Mule flow consumes the order and persists to a DB
- DB connectivity fails
- Users continue to place the orders
- Mule flow consumes the orders, DB connectivity fails, and orders are lost
In the above use case, we can preserve the orders by writing the failed orders to a dead letter queue from the error handler. Even then, this approach is not deemed to be efficient enough, knowing that the database / target system will be down for some time. Circuit breaker will help in addressing this issue by shutting off the flow, which will not consume the order at all from the main queue. However, we have to make sure that the flow is started back again through a maintenance activity.
The below diagram shows the process of where the circuit is broken (at the two red lines). An order arrives in the queue. Mule flow consumes the order from this queue to process the message and send it to a database. If the database goes down and after the database reconnection strategy is exhausted, the flow will check if the threshold value (configured in properties yaml file) is reached. When the threshold value is hit, the flow is stopped automatically to avoid the orders from being consumed further. This way all the orders stay in the parent queue rather than being moved to the dead letter queue.
Implementation
Listener flow
Create a simple listener flow to replicate the order arriving into a queue. The queue in the below diagram is RabbitMQ. Alternatively, Active MQ message broker can also be used.
Main Circuit Breaker flow
This is the flow where all the magic happens. As the diagram shows, this flow listens to the same queue where the orders come in. As the order is consumed, they will be processed in a series of steps.
- Retrieve the last threshold value stored in the object store. During the first run, this value will be defaulted to 1.
- Insert the message consumed into database table. If the database connectivity fails, the control goes to error handler.
- In the error handler, first the order is published to a DLQ (guarding against message loss) and then the threshold limit is checked. If the threshold is not reached, the flow will keep consuming the orders.
- When the threshold is reached (usually set at 3 to 5), the flow is shut off through a groovy code, which is discussed below.
Circuit Breaker Flow - Code
<flow name=”circuit-breaker-flow” doc:id=”3db13c52-57f9-4d7c-8e9d-cb4777b04ed4″ >
<amqp:listener doc:name=”listen to test q” doc:id=”1a1afda7-db3e-482a-a8b3-afddf1fa2fbd” config-ref=”AMQP_Config”
queueName=”test-q” ackMode=”IMMEDIATE”>
</amqp:listener>
<os:retrieve doc:id=”ff11b017-cda2-4be6-8fbd-b7626998d7c8″ doc:name=”retrieve treshold” key=”treshold” objectStore=”Object_store” target=”osTreshold”>
<os:default-value ><![CDATA[#[0]]]></os:default-value>
</os:retrieve>
<logger level=”INFO” doc:name=”treshold value” doc:id=”d5257b9d-9997-403a-abe5-5d0e3ad9cb00″ message=”#[‘********** Current Treshold Value ============= ‘ ++ vars.osTreshold]”/>
<try doc:name=”Try” doc:id=”f08efaeb-3f10-42d7-90c6-5483692f3e18″>\
<db:insert doc:id=”3ea9f5e9-9820-444a-859a-fbc6b9d02c60″ config-ref=”Database_Config” doc:name=”insert into db” >
<db:sql ><![CDATA[insert into test (name, age) values (:name, :age)]]></db:sql>
<db:input-parameters ><![CDATA[#[payload]]]></db:input-parameters>
</db:insert>
<error-handler >
<on-error-continue enableNotifications=”true” logException=”true” doc:name=”On Error Continue” doc:id=”801020b9-8150-4733-a50b-6561210c91c2″ type=”DB:CONNECTIVITY”>
<amqp:publish doc:name=”publish to test dlq” doc:id=”e89bf6e6-4d4a-40f6-b9a0-4559e0ae9b62″ config-ref=”AMQP_Config” exchangeName=”test-dlq-xchg” />
<choice doc:name=”check treshold” doc:id=”1eaa426a-df18-4751-a83f-e6baf583b8e0″ >
<when expression=”#[vars.osTreshold < p(‘circuit.breaker.treshold’)]”>
<os:store doc:name=”store treshold” doc:id=”5b497896-dd70-4ab0-8e2c-6364ff32de59″ key=”treshold” objectStore=”Object_store”>
<os:value ><![CDATA[#[vars.osTreshold as Number + 1]]]></os:value>
</os:store>
</when>
<otherwise >
<os:remove doc:name=”remove treshold” doc:id=”e36ada7c-78ee-4a95-9331-38226bcb844c” key=”treshold” objectStore=”Object_store” />
<flow-ref doc:name=”toggle-flow” doc:id=”99ab57b0-eee3-4269-bb94-6edca88ca4fc” name=”toggle-flow” />
</otherwise>
</choice>
</on-error-continue>
</error-handler>
</try>
</flow>
Toggle Flow
This flow contains a groovy script, which checks the flow state and flips it. This flow is called from the main circuit breaker flow, after the threshold is met.
Post Maintenance Activity Flow
This is a helper flow listener to start the flow after the database is up and ready to accept data insertion. Toggle flow is called from this helper flow.
Circuit Breaker Configuration with Anypoint MQ
Anypoint MQ offers out of the box capabilities and configurations of Circuit Breaker Pattern. Below are the circuit breaker configurations with Anypoint MQ.
Credits
The following articles helped immensely to understand the intricacies of a circuit breaker. They will make a good reading.
https://martinfowler.com/bliki/CircuitBreaker.html
https://microservices.io/patterns/reliability/circuit-breaker.html
Conclusion
On their own, circuit breakers help reduce resources tied up in operations which are likely to fail. You avoid waiting on timeouts for the client, and a broken circuit avoids putting load on a struggling server. I talk here about remote calls, which are a common case for circuit breakers, but they can be used in any situation where you want to protect parts of a system from failures in other parts.
Circuit breakers are a valuable place for monitoring. Any change in breaker state should be logged and breakers should reveal details of their state for deeper monitoring. Breaker behavior is often a good source of warnings about deeper troubles in the environment. Operations staff should be able to trip or reset breakers.
Breakers on their own are valuable, but clients using them need to react to breaker failures. As with any remote invocation you need to consider what to do in case of failure. Does it fail the operation you’re carrying out, or are there workarounds you can do? A credit card authorization could be put on a queue to deal with later, failure to get some data may be mitigated by showing some stale data that’s good enough to display.