Deploying WSO2 API Manager in a Multi-DC Environment
- Bhathiya Jayasekara
- Senior Technical Lead - WSO2
Executive Summary
Today’s businesses typically don’t stick to one country or one region. They mostly serve the entire world. To do this, you need the right infrastructure and services hosted all around the world. Otherwise, you can’t provide a smooth and fast service to your users. The same is valid for your API management solution as well. In this article, we’re going to talk about how to deploy WSO2 API Manager across multiple datacenters (DC) spanning across regions. We mainly explore what you need to know about multi-datacenter API management architectures and how to implement such a deployment. While you can use your preferred database server, in this article, we’re using Oracle GoldenGate (OGG) and Oracle database server.
Prerequisites:
- WSO2 API Manager 3.1.0
- Oracle GoldenGate 12g Release 2
- Oracle Database 11g Release 2
When your digital business spans across the globe, it’s important to treat all users in the same way regardless of where they are coming from. To do that, you may need to rethink certain aspects of your business and its ecosystem. When it comes to APIs and online services, you cannot facilitate these with the servers hosted in only one geographical location. The main problem is latency. With various web technologies available today, the user expectation on latency is really high. But, if you try to serve users in Asia with servers hosted in the US, those users will probably experience considerable delays due to network latencies.
On the other hand, having all your eggs in one basket is not a wise idea obviously. If you have all your servers in one geographical location, and if that location faces some unfortunate natural or man-made disaster, your entire business goes down. So, you must have some kind of a disaster recovery mechanism to avoid such scenarios.
There can also be cases where an organization needs certain location-specific custom policies that should be applied for users who are accessing your APIs. Such policies can be induced and inspired by various political, legal, and priority-related requirements. On the other hand, there can be regional data protection laws and regulations that highly restrict Cross-Border Data Transfers (CBDT). For example, the European Union’s General Data Protection Regulation (GDPR) controls data transfers beyond Europe. And many countries, including China, Australia, New Zealand, Russia, and Argentina, have their own data-localization laws, which restrict data transfer beyond the country’s borders.
The most common solution for all these problems is having servers and datacenters hosted in multiple geographical locations. It resolves the latency problem by redirecting users to the nearest server, and the disaster recovery problem by having redundant servers across multiple geographical locations. And when you have servers hosted in different geographic locations, you can selectively apply policies that should be applied for each region and make them adhere to regional data protection laws and regulations too.
However, maintaining multiple server clusters across the globe can be challenging due to many practical reasons. The main challenge is keeping the shared data consistent. Basically, the data written by one cluster of servers should be available for the other clusters (and vice versa whenever required) in real-time for consistency. In the case of both clusters writing at the same time, merging that data can be tricky and challenging.
The same requirements and challenges are applied for your API management system too. In this article, we discuss how to build a geographically distributed active-active API Management deployment with WSO2 API Manager 3.1.
The following diagram depicts the deployment we will discuss.
We are going to deploy WSO2 API Manager 3.1.0 [1] in two datacenters in two regions. Let’s assume the two regions are US-East and US-West. In each datacenter, we have two APIM nodes and one database node (or a cluster). As mentioned earlier, we’re going to build an active-active deployment across datacenters. However, we have to keep in mind that when it comes to API management, this is twofold: one comprises API runtime components (i.e., the API Gateway, Key Manager, and Traffic Manager), and the other includes API design time/governance components (i.e., the API Publisher and Developer Portal). In most cases, the active-active requirement is essential for the runtime components only. Therefore, in this article, we discuss a deployment that is active-active for the runtime components and active-passive for the design time/governance components.
The below diagram explains the deployment in detail.
The API traffic (gateway traffic) goes to the corresponding region’s load balancer, based on the geo-based DNS resolution, and all Publisher/Store traffic goes to the US-East load balancer. In an API Manager deployment, there are two ways the data is stored. One is the databases and the other is the file system. When we think about databases, you might already know that WSO2 API Manager 3.1.0 uses two databases: the APIM Database (AM_DB) and the Shared Database (SHARED_DB), which is a combination of the User Management Database and Registry Database, which we used to have in the 2.x versions. In this deployment, we have to configure AM_DB and SHARED_DB to have two-way replication across datacenters. The API runtime artifacts are stored in the file system and need to be copied across the gateways in all regions. The recommended way to share these files is by using a Network File System (NFS). For example, in AWS you can use Amazon’s Elastic File System (EFS), which is an NFS.
Database replication across datacenters
The way you configure database replication depends on the database system you are planning to use. In this article, we’re going to talk about what you need to know about database replication for WSO2 API Manager in general for any database server, and how you can implement it with Oracle GoldenGate (OGG) with Oracle database.
In the next subsections, we are looking at what you need to know about different aspects of database replication and how to implement those.
Unique keys for all tables
When you have two-way replication configured in your database, there’s a high chance for data conflicts to occur in it. Therefore, conflict detection and avoidance is important for a system to run smoothly. For that, one major requirement is having unique key (not null) constraints or primary key constraints for all tables [2].
This is already handled in default multi-dc Oracle scripts in API Manager. So, you don’t have to introduce any new unique or primary keys to any table in it.
Handle auto-increment columns
When you insert data into a database table that contains an auto-increment column, you don’t specify a value for that column in your insert query. That value is calculated by the database server itself based on the auto-incremented value of the last insertion. This works fine as far as you have a central database where the database server can keep a variable to increment whenever a new table entry is inserted. However, in a multi-datacenter scenario, the situation is much more complex. In the case of two clients in two DCs writing to the same table, the two database servers increment the auto-increment column based on their own variable, but when it comes to syncing these data between the datacenters, there can be uniqueness conflicts due to having the same auto-incremented value in both datacenters.
To solve this problem, different database servers use different conflict resolution strategies [3]. Here are some brief explanations of some common ones.
- Node-specific sequence range
In this method, each primary node of the database cluster is given a sequence range for the auto-increment column. For example, one primary node is given the range of 1-500000 and the second one is given the range of 500001-1000000. This ensures that each primary node has unique auto-incremented values always.
- Common sequence
This approach uses a common, shared sequence. The sequence is stored in a single location and each primary node makes a call to that location to pick the next available number in the common sequence. However, this method is inefficient compared to the previous ones due to the added network latency in the next number fetching call.
- Start value variation
In this approach, each primary node is given a different starting value, and each of them increments the auto-increment column by the number of the primary nodes in the system. For example, if there are three primary nodes in the system, the starting values of those three will be 1, 2, and 3, respectively. All of these will increment their values by 3, which is the number of primary nodes in the system. So the three sequences will be like this.
Node 1 - 1,4,7,10,…
Node 2 - 2,5,8,11,...
Node 3 - 3,6,9,12,...
As you can see, it never gives the same number for any two nodes, and that avoids conflicts.
In our case, due to its simplicity and efficiency, we are going to use the “start value variation” method, which is supported by the Oracle database. However, you may want to pick a strategy that your preferred database server supports.
WSO2 API Manager 3.1.0 comes with the Oracle SQL scripts for multi-dc deployment. (APIM 3.0.0 does not ship multi-dc scripts, but you can find them here [4]). The file structure is like this.
In the above two “sequences.sql” files, we have auto-increment sequences like this.
CREATE SEQUENCE AM_API_SEQUENCE START WITH 1 INCREMENT BY 1
You need to have separate SQL scripts for each datacenter. Therefore, you can take a copy from the default one and update them separately like this.
CREATE SEQUENCE AM_API_SEQUENCE START WITH <DATACENTER_ID> INCREMENT BY <NUMBER_OF_DATACENTERS>
E.g.,
DC1:
CREATE SEQUENCE AM_API_SEQUENCE START WITH 1 INCREMENT BY 3
DC2:
CREATE SEQUENCE AM_API_SEQUENCE START WITH 2 INCREMENT BY 3
DC3:
CREATE SEQUENCE AM_API_SEQUENCE START WITH 3 INCREMENT BY 3
Remove Cascade Operations
“ON DELETE/UPDATE CASCADE” operations can cause conflicts in the data replication process. Think of a scenario like this.
- DC1 deletes a parent row, which then triggers the cascade operation.
- Then, it deletes the corresponding child rows as well.
- Then, the parent row deletion is replicated in DC2.
- When the parent row of DC2 is deleted, it triggers the cascade operation in DC2, which deletes the corresponding child rows in DC2.
- Meanwhile, the DC1’s child row deletion is pending to be replicated in DC2. When that is executed, it fails due to a conflict, because those rows were already deleted by the DC2 cascade operation.
Therefore, we can’t have cascade operations with data replication. The solution is “triggers” [5]. We replace the Cascades with Triggers so that it removes corresponding child rows before the parent row. In that case, the above scenario looks like this.
- DC1 “tries” to delete a parent row, which then fires the trigger.
- Then the trigger deletes the corresponding child rows.
- Once all child rows are deleted, the parent row is deleted.
- Then, the child row deletion is replicated in DC2.
- Then, the parent row deletion is replicated in DC2 which “tries” to delete the parent row in DC2.
- That fires the trigger in DC2 and it checks if there are any child rows to be deleted, but doesn’t find any because the child rows were already deleted at step 4.
- Then, DC2 deletes the parent row.
In this case, the trigger completes without any errors.
Here is such a sample trigger, which replaces cascade operations.
CREATE OR REPLACE TRIGGER TRG_DEL_AM_API
BEFORE DELETE
on AM_API
FOR EACH ROW
BEGIN
DELETE FROM AM_SUBSCRIPTION AMSU WHERE AMSU.API_ID = :OLD.API_ID;
DELETE FROM AM_API_LC_EVENT AMLE WHERE AMLE.API_ID = :OLD.API_ID;
DELETE FROM AM_API_COMMENTS AMAC WHERE AMAC.API_ID = :OLD.API_ID;
DELETE FROM AM_API_RATINGS AMAR WHERE AMAR.API_ID = :OLD.API_ID;
DELETE FROM AM_EXTERNAL_STORES AMES WHERE AMES.API_ID = :OLD.API_ID;
DELETE FROM AM_API_SCOPES AMS WHERE AMS.API_ID = :OLD.API_ID;
DELETE FROM AM_API_PRODUCT_MAPPING AAPM WHERE AAPM.API_ID = :OLD.API_ID;
DELETE FROM AM_API_CLIENT_CERTIFICATE AACC WHERE AACC.API_ID = :OLD.API_ID;
END;
/
This is already handled in default multi-DC SQL scripts in API Manager. So, you don’t need to make any extra changes to it.
Avoid access token conflicts between datacenters
API Manager stores access tokens in the IDN_OAUTH2_ACCESS_TOKEN table. However, the default table does not guarantee to facilitate error-free token generation operations in the case of multi-DC environments. The problem is if the same user sends the same token request to two different datacenters, it can throw a unique key violation error as it can violate the unique key constraint in that table.
If you don’t have geo-based routing for gateway traffic (i.e., API and token requests), which guarantees the same user’s requests will always go to the same datacenter, this error is inevitable. Therefore, it’s recommended to have geo-based routing for any multi-datacenter deployment.
However, even if you have geo-based routing, in the case of a datacenter failure, all users are redirected to the other datacenter, and that can again lead to the above error scenario.
So, we have to handle this, and there is an easy way to do that. We do it with an SQL script change. Here we introduce a new column to the IDN_OAUTH2_ACCESS_TOKEN table with a default value that is unique to each datacenter, and we add it to the aforementioned unique key constraint as well.
In the “tables.sql” file inside the “dbscripts/multi-dc/oracle/apimgt” directory, you can find a create table query for the token table like this.
There you can see a column named “DCID” (which is added only to the multi-dc scripts) and the default value of that is “DC1”. And it is added to the “CON_APP_KEY” constraint as well.
You need to have different values (e.g., “DC1”, “DC2”, and “DC3”) for the default value of this new column in SQL scripts of each of your datacenters. This guarantees that the entries inserted from different datacenters are unique.
Preventing “Data Looping”
When you configure your database system to do two-way replication, there's a chance for data replication loops to occur unless we configure it not to. The way you can configure this depends on your database vendor. In Oracle, you can use the TRANLOGOPTIONS parameter with the EXCLUDETAG or EXCLUDEUSER. Refer to Oracle documentation [6] for more details.
Don’t replicate DDL operations
Some database servers, such as Oracle, support replicating both DDL and DML operations [7], while many support only DML. In our case, since we maintain separate SQL scripts for each datacenter, we need to make sure we don’t replicate DDL operations across the datacenters. You may need to refer to your database vendor’s documentation for more details on this.
Designate one database as “The Trusted Source”
As a best practice, the database in one datacenter should be designated as the trusted source [5] and it should be frequently backed up. Keeping one database as the trusted source is important for syncing because when you start syncing, you initialize other databases using this special database. And, whenever you need to resynchronize due to any failures, you should use the trusted source for that.
So that’s about databases. We discussed that you need to keep separate SQL scripts for each of your datacenters and what changes you need to do in each of them. Next, we’re talking about API runtime artifact synchronization between the datacenters.
Runtime artifact synchronization
As mentioned in the beginning, we need to sync runtime artifacts across datacenters. The recommended way to do this is via a Network File System (NFS) such as EFS. You have to mount the following locations in the server to share the runtime artifacts.
- <APIM_HOME>/repository/deployment/server/synapse-configs
- <APIM_HOME>/repository/deployment/server/executionplans
- <APIM_HOME>/repository/tenants (Only if you are using tenancy)
That brings us to the end of this article. I hope this gave you a good idea of what aspects you need to pay attention to when you set up WSO2 API Manager on a multi-datacenter deployment. To keep the article concise, we discussed multi-datacenter specific details only. The rest of the steps you need to set up an API Manager deployment are the same as the steps for a single-datacenter deployment, and you can refer to official WSO2 API Manager documentation for those details.
Summary
Having a multi-datacenter deployment is important to serve users around the world with low latency. And it’s important for disaster recovery as well. One of the main challenges of a multi-datacenter deployment is keeping data consistent. When you deploy WSO2 API Manager in multi-datacenter deployment, you need to pay attention to two such aspects: the file system and databases. In the case of the file system, network file systems are recommended to share artifacts between datacenters. In the case of databases, you need to configure a distributed database with data replication.
In this article, we mainly discussed how to configure a distributed Oracle database using Oracle GoldenGate (OGG) for API Manager. We explored aspects such as how to handle auto-increment columns in tables, how to handle cascade operations, how to avoid data loopback, how to avoid access token conflicts in API Manager, and some other database replication best practices.
References
[1] https://wso2.com/api-management/
[2] https://docs.oracle.com/goldengate/1212/gg-winux/GWUAD/wu_bidirectional.htm#GWUAD314
[3] https://github.com/bhathiya/apim-multi-dc-sql-scripts
[4] https://docs.oracle.com/goldengate/1212/gg-winux/GWUAD/wu_bidirectional.htm#GWUAD287
[5] https://docs.oracle.com/goldengate/1212/gg-winux/GWUAD/wu_bidirectional.htm#GWUAD298
[6] https://docs.oracle.com/goldengate/1212/gg-winux/GIORA/ddl.htm#GIORA285