[Article] Scalable Traffic Manager Deployment Patterns for WSO2 API Manager - Part 2
- Sanjeewa Malalgoda
- Director - Engineering | Architect at WSO2 - WSO2
Table of contents
- Introduction
- Data receiver patterns
- Data publishing patterns
- Failover data publishing
- Load balance data publishing to multiple receivers
- Load balance data publishing to multiple receiver groups
- Publishing to multiple receiver groups with load balancing within group
- Publishing to multiple receiver groups with failover within group
- Data publishing to all receivers
- Conclusion
Introduction
In the previous article, we discussed Traffic Manager and its usages in a distributed deployment. In this article, we will discuss different data receiver and data publisher patterns and how we can effectively use them in different deployments. Understanding these patterns is paramount for deploying highly scalable solutions.
Data receiver patterns
When we consider a distributed API Manager deployment, the gateway worker node can communicate with multiple traffic manager instances as per a particular deployment pattern. Once the gateway worker receives throttling decisions it can decide (locally) whether requests can pass through or whether they need to be throttled (we’ll discuss this update retrieving process and related methodologies later).
The usual scenario in Traffic Manager goes like this: we have a message queue, and once the throttling decision is calculated, it will be placed in this message queue with a specific topic name. Multiple gateway workers are subscribed to this topic and fetch updates from that.
In figure 1 below, the red dotted lines connecting traffic managers and gateway worker denote the throttling decision update retrieval process. Based on the configuration we can add one or more traffic managers to receive throttling decision updates.
Figure 1
Now examine the configuration below, which is related to JMS connections. As you can see here, we can define multiple brokers as a broker list and configure certain additional parameters to define broker selection methodology. In this configuration we have added 2 brokers with a failover pattern to receive throttling data updates: if one broker is not available, it will connect to other to receive updates.
<JMSConnectionParameters> <transport.jms.ConnectionFactoryJNDIName>TopicConnectionFactory</transport.jms.ConnectionFactoryJNDIName> <transport.jms.DestinationType>topic</transport.jms.DestinationType> <java.naming.factory.initial>org.wso2.andes.jndi.PropertiesFileInitialContextFactory</java.naming.factory.initial> <connectionfactory.TopicConnectionFactory>amqp://admin:admin@clientID/carbon?failover='roundrobin'%26cyclecount='2'%26brokerlist='tcp://127.0.0.1:5673?retries='5'%26connectdelay='50';tcp://127.0.0.1:5674?retries='5'%26connectdelay='50''</connectionfactory.TopicConnectionFactory> </JMSConnectionParameters>
It is of paramount importance that we follow this approach when we have multiple Traffic Managers. Consider a deployment across multiple data centers. Gateways need to subscribe to Traffic Managers residing within each data center. But a gateway can still publish throttle events to another data center (if we have a link of sufficient bandwidth) which enables us to have precise counters across data centers.
Data publishing patterns
Failover data publishing
We discussed earlier that a gateway data receiver needs to be configured with a failover data receiver. However, a data publisher can be configured according to a load balance or a failover pattern: in this section, we will see how we can publish throttling events to a Traffic Manager in a failover pattern.
Figure 2
When using the failover configuration, events are sent to multiple Traffic Manager receivers in a sequential order based on priority. You can specify multiple Traffic Manager receivers so that events can be sent to the next server in the sequence in a situation where they were not successfully sent to the first server.
In the scenario depicted in the above image events are first sent to Traffic Manager Receiver-1; if it is unavailable, the events will be sent to Traffic Manager Receiver-2. If that is also available, then events will be sent to Traffic Manager Receiver-3. In this scenario, event duplication is false because one event will always go to only one receiver. Only if that fails, too,will it go to one of the other available nodes.
<DataPublisher> <Enabled>true</Enabled> <Type>Binary</Type> <ReceiverUrlGroup>tcp://127.0.0.1:9612 | tcp://127.0.0.1:9613 | tcp://127.0.0.1:9614</ReceiverUrlGroup> <!--ReceiverUrlGroup>tcp://${carbon.local.ip}:9612</ReceiverUrlGroup--> <AuthUrlGroup>ssl://127.0.0.1:9712 | ssl://127.0.0.1:9713 | ssl://127.0.0.1:9714</AuthUrlGroup> <!--AuthUrlGroup>ssl://${carbon.local.ip}:9712</AuthUrlGroup--> <Username>${admin.username}</Username> <Password>${admin.password}</Password> <DataPublisherPool> <MaxIdle>1000</MaxIdle> <InitIdleCapacity>200</InitIdleCapacity> </DataPublisherPool> <DataPublisherThreadPool> <CorePoolSize>200</CorePoolSize> <MaxmimumPoolSize>1000</MaxmimumPoolSize> <KeepAliveTime>200</KeepAliveTime> </DataPublisherThreadPool> </DataPublisher> <JMSConnectionParameters> <transport.jms.ConnectionFactoryJNDIName>TopicConnectionFactory</transport.jms.ConnectionFactoryJNDIName> <transport.jms.DestinationType>topic</transport.jms.DestinationType> <java.naming.factory.initial>org.wso2.andes.jndi.PropertiesFileInitialContextFactory</java.naming.factory.initial> <connectionfactory.TopicConnectionFactory>amqp://admin:admin@clientID/carbon?failover='roundrobin'%26cyclecount='2'%26brokerlist='tcp://127.0.0.1:5673?retries='5'%26connectdelay='50';tcp://127.0.0.1:5674?retries='5'%26connectdelay='50'';tcp://127.0.0.1:5675?retries='5'%26connectdelay='50''</connectionfactory.TopicConnectionFactory> </JMSConnectionParameters>
Now let’s see what happens to other Traffic Managers if one Traffic Manager stops working. If one Traffic Manager instance goes down, then it will immediately notify other nodes and it will print the following logs in the console of working Traffic Managers:
[2016-08-29 13:44:00,004] ERROR - JMSConnectionFactory Error acquiring a Connection from the JMS CF : jmsEventPublisher1 using properties : {transport.jms.ConcurrentPublishers=allow, java.naming.provider.url=repository/conf/jndi1.properties, java.naming.factory.initial=org.wso2.andes.jndi.PropertiesFileInitialContextFactory, transport.jms.DestinationType=topic, transport.jms.ConnectionFactoryJNDIName=TopicConnectionFactory, transport.jms.Destination=throttleData} org.wso2.carbon.event.output.adapter.core.exception.OutputEventAdapterRuntimeException: Error acquiring a Connection from the JMS CF : jmsEventPublisher1 using properties : {transport.jms.ConcurrentPublishers=allow, java.naming.provider.url=repository/conf/jndi1.properties, java.naming.factory.initial=org.wso2.andes.jndi.PropertiesFileInitialContextFactory, transport.jms.DestinationType=topic, transport.jms.ConnectionFactoryJNDIName=TopicConnectionFactory, transport.jms.Destination=throttleData}
As you can see above, Node 2 cannot put events into topics deployed in Node 2 (which is defined in repository/conf/jndi1.properties file).
connectionfactory.TopicConnectionFactory = amqp://admin:admin@clientid/carbon?brokerlist='tcp://localhost:5673'
In gateway nodes you will see the following logs.
[2016-08-29 13:52:28,838] INFO - FailoverRoundRobinServers ==== Checking failoverAllowed() ==== [2016-08-29 13:52:28,838] INFO - FailoverRoundRobinServers Cycle Servers: Cycle Retries:2 Current Cycle:0 Server Retries:5 Current Retry:0 Current Broker:1 tcp://127.0.0.1:5673?retries='5'&connectdelay='50' >tcp://127.0.0.1:5674?retries='5'&connectdelay='50' [2016-08-29 13:52:28,838] INFO - FailoverRoundRobinServers ==================================== [2016-08-29 13:52:28,838] INFO - AMQStateManager Setting ProtocolSession:AMQProtocolSession[AMQConnection: Host: 127.0.0.1 Port: 5674 Virtual Host: carbon Client ID: clientID Active session count: 1] [2016-08-29 13:52:28,838] INFO - FailoverHandler Starting failover process [2016-08-29 13:52:28,838] INFO - FailoverRoundRobinServers ==== Checking failoverAllowed() ==== [2016-08-29 13:52:28,838] INFO - FailoverRoundRobinServers Cycle Servers: Cycle Retries:2 Current Cycle:0 Server Retries:5 Current Retry:0 Current Broker:1 tcp://127.0.0.1:5673?retries='5'&connectdelay='50' >tcp://127.0.0.1:5674?retries='5'&connectdelay='50'
This happens because the Gateway noticed that one of the failover endpoints went down and needs to switch to another node.
You will also see the following error message from that point onward (until the other node is up and running) because the gateway is trying to publish events to Traffic Manager Node 2. Even if we configure failover for load balance data publisher you will likely see an error message saying that a connection cannot be established to Node 01. If you configure failover data publishing the gateway will ping and check node availability: this will happen every 30 seconds to check if the other node is functioning. Once it's up and running, the server will effectively heal itself.
[2016-08-29 13:47:52,192] ERROR - DataEndpointConnectionWorker Error while trying to connect to the endpoint. Cannot borrow client for ssl://127.0.0.1:9712 org.wso2.carbon.databridge.agent.exception.DataEndpointAuthenticationException: Cannot borrow client for ssl://127.0.0.1:9712 at org.wso2.carbon.databridge.agent.endpoint.DataEndpointConnectionWorker.connect(DataEndpointConnectionWorker.java:100) at org.wso2.carbon.databridge.agent.endpoint.DataEndpointConnectionWorker.run(DataEndpointConnectionWorker.java:43) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
Load balance data publishing to multiple receivers
Now let’s examine a load balancing setup. In this case, load balanced publishing is done in a Round Robin manner, sending each event to each receiver in a circular order without any priority.
Figure 3
It also handles certain failover cases. For example, if Traffic Manager Receiver 1 is marked as down, then the Data Agent will send the data only to Traffic Manager Receiver 2 (and if we have more nodes, then for all of them) in a Round Robin manner. When Traffic Manager Receiver 1 becomes active after some time, the Data Agent automatically detects it, adds it to the operation, and again starts to load balance between all receivers. This functionality significantly reduces the loss of data and provides more concurrency. In this scenario one message will always go to one data receiver and event duplication will not happen.
For this functionality, include the server URL in the Data Agent as a general traffic manager receiver URL. The URL should be entered in a comma separated format as shown below:
<DataPublisher> <Enabled>true</Enabled> <Type>Binary</Type> <ReceiverUrlGroup>tcp://127.0.0.1:9612,tcp://127.0.0.1:9613</ReceiverUrlGroup> <AuthUrlGroup>ssl://127.0.0.1:9712,ssl://127.0.0.1:9713</AuthUrlGroup> <Username>${admin.username}</Username> <Password>${admin.password}</Password> <DataPublisherPool> <MaxIdle>1000</MaxIdle> <InitIdleCapacity>200</InitIdleCapacity> </DataPublisherPool> <DataPublisherThreadPool> <CorePoolSize>200</CorePoolSize> <MaxmimumPoolSize>1000</MaxmimumPoolSize> <KeepAliveTime>200</KeepAliveTime> </DataPublisherThreadPool> </DataPublisher>
Load balance data publishing to multiple receiver groups
Assume that there are two group of servers, referred to as Group A and Group B. You can send events to both the groups. You can also carry out load balancing for both sets, as mentioned in load balancing between a set of servers. This scenario is a combination of load balancing between a set of servers and sending an event to several receivers.
An event is sent to both Group A and Group B. Within Group A, it will be sent either to Traffic Manager 1 or Traffic Manager 2. Similarly within Group B, it will be sent either to Traffic Manager 3 or Traffic Manager 4. In the setup, you can have any number of Groups and any number of Traffic Managers (within a group) as required - this is done by mentioning them accurately in the server URL. For this scenario it's mandatory to publish events to each group, but within a group we can do it two different ways:
- Publishing to multiple receiver groups with load balancing within group
- Publishing to multiple receiver groups with failover within group
Now let's discuss both of these options in detail. This pattern is the recommended approach for multi-data center deployments when we need to have unique counters across data centers. Each group will reside within data center and within data center 2 Traffic Manager nodes will be there to for high availability scenarios.
Publishing to multiple receiver groups with load balancing within group
As you can see diagram below data publisher will push events to both groups. But since we do have multiple nodes within each group it will send event to only one node at a given time in round robin fashion. That means within group A first request goes to Traffic Manager 1 and next goes to Traffic Manager 2 and so. If Traffic Manager Node 1 is unavailable then all traffic will go to Traffic Manager Node 2 and it will address failover scenarios.
Figure 4
Similar to the other scenarios, you can describe this as a receiver URL. The Groups should be mentioned within curly braces separated by commas. Furthermore, each receiver that belongs to the group should be within the curly braces and with the receiver URLs in a comma separated format. The receiver URL format is given below.
<DataPublisher> <Enabled>true</Enabled> <Type>Binary</Type> <ReceiverUrlGroup>{tcp://127.0.0.1:9612,tcp://127.0.0.1:9613},{tcp://127.0.0.2:9612,tcp://127.0.0.2:9613} </ReceiverUrlGroup> <AuthUrlGroup>{ssl://127.0.0.1:9712,ssl://127.0.0.1:9713}, {ssl://127.0.0.2:9712,ssl://127.0.0.2:9713}</AuthUrlGroup> <Username>${admin.username}</Username> <Password>${admin.password}</Password> <DataPublisherPool> <MaxIdle>1000</MaxIdle> <InitIdleCapacity>200</InitIdleCapacity> </DataPublisherPool> <DataPublisherThreadPool> <CorePoolSize>200</CorePoolSize> <MaxmimumPoolSize>1000</MaxmimumPoolSize> <KeepAliveTime>200</KeepAliveTime> </DataPublisherThreadPool> </DataPublisher>
Publishing to multiple receiver groups with failover within group
As you can see in the diagram below, the data publisher will push events to both groups. Since we do have multiple nodes within each group, it will send the event to only one node at a given time: if that node goes down, then the event publisher will send events to the other node within same group. This model guarantees message publishing to each server group.
Figure 5
According to the following configuration, the data publisher will send events to both group A and B. Within group A it will go to either Traffic Manager 1 or Traffic Manager 2. If events go to Traffic Manager 1 then until it becomes unavailable events will go to that node. Once its unavailable events will go to Traffic Manager 2.
<DataPublisher> <Enabled>true</Enabled> <Type>Binary</Type><ReceiverUrlGroup>{tcp://127.0.0.1:9612 | tcp://127.0.0.1:9613},{tcp://127.0.0.2:9612 | tcp://127.0.0.2:9613} </ReceiverUrlGroup> <AuthUrlGroup>{ssl://127.0.0.1:9712,ssl://127.0.0.1:9713}, {ssl://127.0.0.2:9712,ssl://127.0.0.2:9713}</AuthUrlGroup> …………………….. </DataPublisher>
Data publishing to all receivers
In this scenario, we will be sending all events to more than one Traffic Manager receiver. This approach is mainly followed when you need to use other servers to analyze events together with Traffic Manager servers. You can use this functionality to publish the same event to both servers at the same time. This will be useful to perform real-time analytics with CEP, to make the data persistent, and also to perform complex analysis with DAS at near-real-time speeds with the same data.
Figure 6
Similar to load balancing between a set of servers, you need to modify the Data Agent URL. You should include all traffic manager receiver URLs within curly braces ({}) separated with commas as shown below.
<DataPublisher> <Enabled>true</Enabled> <Type>Binary</Type> <ReceiverUrlGroup>{tcp://127.0.0.1:9612},{tcp://127.0.0.1:9613} </ReceiverUrlGroup> <AuthUrlGroup>{ssl://127.0.0.1:9712},{ssl://127.0.0.1:9713}</AuthUrlGroup> …………………. </DataPublisher>
Both servers need to know the event that we publish to them. To do that, we can configure the event receiver with the following configuration; once we set this to “true”, the event receiver knows that this event is sent to all other nodes in the cluster.
<property name="events.duplicated.in.cluster">false</property>
Conclusion
In this article we’ve discussed different deployment patterns for API Manager Traffic Manager in distributed deployment. In distributed deployments, it’s common for the API gateway to be the component we scale as load increases. It's also important to scale Traffic Manager as the gateway cluster grows because everything related to throttling is handled there.