[Article] How to Achieve Delivery Reliability with Dead Letter Channel Pattern - Part 1

  • By Hasitha Abeykoon
  • 17 Nov, 2015

Table of contents

  • Applies to
  • Introduction
  • Factors to consider when setting up a broker cluster
  • Deployment of servers
  • Components of the solution and message flow
  • Limitations of the solution with ESB cluster
  • Overcoming these limitations
  • How a distributed MB cluster works
  • Scenario implementation
  • Setting up clusters
    • MB cluster
    • ESB cluster
  • Conclusion


Applies to

WSO2 Message Broker 3.0.0
WSO2 Enterprise Service Bus 4.8.1, 4.9.0


Introduction

When adding reliability to distributed messaging it gets more difficult to implement than usual. Most middleware systems rely on a message broker for reliable and asynchronous messaging. There can be situations where a single message broker node is not sufficient to deliver the performance required by the system’s other components. That is when we need to use a cluster of brokers. WSO2 Message Broker (WSO2 MB) can be easily integrated with other products in the WSO2 middleware platform and specially with the WSO2 Enterprise Service Bus, which is designed for message mediation. WSO2 MB together with WSO2 ESB makes it easy to make comprehensive message flows. The typical use case will be compiled by achieving delivery reliability with a MB cluster that is configured with an ESB cluster.

This is the first of a two part series of articles on this subject. To start with, this article describes the concepts of achieving delivery reliability and the WSO2 ESB and WSO2 MB setup we are going to do in order to achieve reliable message delivery. Real implementation details are discussed in the second article of the series.


Factors to consider when setting up a broker cluster

When defining a scaled deployment for a particular use case, throughput, latency, concurrency and work unit size are primary factors that need to be taken into account. Even though it is popularly believed that concurrency and throughput have a positive linear growth, this is not the case in reality. As concurrency increases it creates and adverse effect on the throughput because of the latency. Because of this, we need to perform a capacity plan with good comprehension of the system.

Note that when you use a broker cluster you cannot achieve global ordering of messages. For global ordering, the only option is vertical scaling your broker for the demanding performance requirements. In that way you can still have high availability by configuring another broker as a backup (active-passive setup). Having global order needs coordination between each node in the cluster, making it a performance killer. Having a local order of messages that are received through a particular node is important.


Deployment of servers

Here we will consider a cluster of three message broker nodes and a cluster of three enterprise service bus nodes (the manager node will also be a worker node). At a given moment all servers will be active and running.

  • MB nodes - MB1, MB2, MB3
  • ESB nodes - ESB1, ESB2, ESB3

We can configure brokers to each ESB node with failover in the following order. This ensures that no ESB node will stop serving messages if a particular broker crashes or a network partition between ESB and the default connected MB node crashes.

  • ESB1 - MB1, MB2, MB3 - default connected to MB1
  • ESB2 - MB2, MB3, MB1 - default connected to MB2
  • ESB3 - MB3, MB1, MB2 - default connected to MB3

Figure 1


Components of the solution and message flow

Here, the ESB acts as a message transformer from HTTP to JMS. We will be using the following ESB components to build the solution:

  • ESB JMS sender proxy - transforms incoming messages into JMS and send to the broker
  • ESB JMS listener proxy - listens to a particular destination on the broker, gets the message and adds it to the mediation flow

Now let’s see how we can build a message flow with reliable message delivery using the above components. This pattern is called the Dead Letter Channel Pattern (DLC Pattern). Before going into the implementation details it is important to understand the message flow and how this can bring reliable message delivery. Consider the following diagram:

Figure 2

  • The message is received by the ESB from the client
  • JMS proxy sends it to a defined durable topic
  • We register a JMS listener proxy for the topic with a subscription id=x. This proxy has a daemon thread that runs in the background. As soon as a message is received by topic subscription id=x’s copy will be sent to the ESB
  • Save the received message to a property
  • Using the received message from the queue, subsequent steps of mediation is performed. Note that during mediation if we call external endpoints we need to use the Callout Mediator because this is a transaction where every call should be blocked and done within a single thread
  • At each mediation step we need to be careful of the success or failure. If all steps of mediation are done successfully we send the response to the client (this might also be a DB record insert without a response). Transaction commit will happen automatically if no mediation error occurs.
  • If a mediation error occurs, we will rollback the transaction. This will generate an AMQ NAC and the message will not be acknowledged by the ESB. The JMS listener proxy will receive the message again.


Limitations of the solution with ESB cluster

The above solution looks promising, but when it comes to load balancing using an ESB cluster there are a few points we should worry about. When we create a CAR file (a deployable archive to ESB with all the above components) with the above mentioned mediation flow, it will be deployed to all worker nodes in the cluster. All individual nodes will try to connect using subscription id=x. This creates a JMS spec violation. The JMS specification states that only one subscriber (a TCP connection) can exist for a given subscription ID. Thus the node that subscribes first will create the connection. Other nodes will give exceptions and fail. If we leave the system like this

  • All load will have to be handled by one ESB node
  • There is no room for scaling
  • All MB nodes will not be utilized
  • The system will have a single point of failure from the ESB’s point of view, because if that ESB node crashes the other ESB nodes will not be notified to retry connecting with subscription id=x


Overcoming these limitations

With WSO2 MB 3.0.0 we have introduced a way to allow shared subscriptions to a particular subscription ID. This configuration will override JMS 1.1 behaviour. This problem is solved in JMS 2.0

To allow shared topic subscriptions at each MB node in the MB cluster edit the <MB_HOME>/repository/conf/broker.xml file.

 

  ……
     true

Refer here for more detail on shared subscription feature of WSO2 MB.

There is another important setting to be configured at the client side: increasing AckTimeout. If this is not done the broker will wait for the ack it sent to the message processor. If it doesn’t hear from the message processor for some time, the broker will schedule to re-deliver that message. As the ESB caches the message can be duplicated.

To prevent this, follow the required client side configuration here.

A system property can be passed to the ESB using the wso2server.sh script or wso2server.bat file in case the ESB acts as the client here. Place the system property in every ESB node in the cluster. Note that this configuration is in milliseconds.

 
-DAndesAckWaitTimeOut=1800000


How a distributed MB cluster works

WSO2 MB uses Hazelcast to coordinate beween nodes. This comes with WSO2 Carbon platform clustering features. Following are some important points on message coordination inside the MB cluster:

  • When a message is received by the MB content, metadata is saved separately into a database shared between all cluster nodes
  • When a subscription comes for a node that information is distributed to, each and every node is aware of that subscription
  • Messages are ordered by timestamp and stored. Message ID is generated based on the timestamp. Thus, we can separate messages into rages logically (slots)
  • A node is elected as ‘Slot Coordinator’
  • Slots are distributed among the nodes that have subscriptions by a distributed algorithm and are coordinated by the Slot Coordinator node
  • Each subscribers can consume the messages allocated to the node in parallel
  • Thus in the distributed Message Broker original message order is not preserved

Figure 3


Scenario implementation

Now let us focus on how to implement the above message flow and build a reliable messaging system using WSO2 MB and WSO2 ESB products. You can download WSO2 MB here and WSO2 ESB here

In this setup we assume that both the MB cluster and ESB cluster is setup on a single machine. Port offsets of servers will be as follows

  • ESB1 - offset=0
  • ESB2 - offset=1
  • ESB3 - offset=2
  • MB1 - offset=3
  • MB2 - offset=4
  • MB3 - offset=5


Setting up clusters

MB cluster

Clustering message brokers is easy. It has no worker manager concept. All you need to do is enable clustering and point the nodes to a single message store. Please refer here for documentation on how to make a MB cluster step by step. Make a three node cluster.

ESB cluster

ESB cluster has a worker manager concept. Please refer here for documentation on how to cluster WSO2 ESB. Make a three node ESB cluster.

The final solution would look like this:

Figure 4


Conclusion

Together, WSO2 ESB and WSO2 MB can be used to build a reliable messaging system that is scalable and fail tolerant. WSO2 MB has internal distributed algorithms that can deliver messages to subscribers that are scaled out to different nodes, while processing the messages in parallel. WSO2 ESB also provides semantics and a powerful configuration language that we can be used to configure messaging behaviour without writing any actual code. In part 2 of this article we will go into detail of implementing this solution.

About Author

  • Hasitha Abeykoon
  • Associate Technical Lead
  • WSO2