Seamless Data Integration With Centralized (Monolithic) or Decentralized (Microservices) Architectures
- Chanika Geeganage
- Technical Lead - WSO2
Introduction
Data is one of the most valuable assets in any organization. Utilizing data correctly can empower businesses to make important business decisions, increase business growth, and improve profitability. Therefore combining data across different datasources and exposing them for business use cases has become an essential requirement. This article discusses WSO2’s solution for seamless data integration of two widely used service architectures - centralized/monolithic architecture and decentralized/microservices architecture.
Monolithic applications are designed to facilitate numerous businesses use cases that the company relies upon to conduct their day-to-day business. In other words, all the components which are needed to perform their businesses logics are compromised into a single application. ERP and CRM are some examples of monolithic applications. Components of such applications are interconnected and interdependent. As a result, the agility of such applications is troublesome. The infrastructure associated with such applications is also a problem. When scaling up a single component in the monolithic application, it requires resources for another instance of the entire application.
Service Oriented Architecture (SOA) came into the picture as a solution to this problem. In this architecture, the application can be decoupled into small modules which are known as services. All these services are integrated using an aggregation layer which is called a bus. Every service is communicating through this services bus. Over time this application can become monolithic, as all the services are integrated with the same integration layer.
Microservices are an evolution to reduce the limitations that reside in SOA architecture. This involves developing a single application as a combination of fine-grained and independent services called microservices that run and deployed independently. These services are created to serve only specific business functions such as managing inventory, managing customers, etc. Furthermore, the microservices should be independent of each other. This allows developers to develop each microservice using different languages, and use different databases for each microservice. To communicate between microservices, it uses lightweight protocols such as HTTP and REST.
The above diagram illustrates a software application for a warehouse management system in a company.
This article discusses the capabilities provided by WSO2 Enterprise Integrator to provide seamless data integration for both monolithic and microservices architectures (MSA).
Data Integration
Data integration simply refers to providing a unified view of data from different data sources to the end user. This also includes processing, combining, and presenting data to the end user. In today's world, organizations use data integration tools to govern data and perform their day-to-day business activities. For example, suppose a company has to update its inventory level and customer information for a product purchase. The inventory level data and customer data may reside in two different data sources.
In a monolithic architecture, data is stored in a centralized location and each service in the monolithic application query the data directly. In a microservices architecture, the functions are divided across multiple microservices. Because of this, we cannot use the same centralized database as it violates the loosely coupled nature of microservices. Therefore, each microservice would need to have its own database. This prevents the risk of unexpected data modifications by some other microservices and data inconsistency. In addition, when the database schema is changed in the datasource, other microservices are not affected by these modifications.
Here is a summary of the key benefits of implementing decentralized data management in an MSA:
- It ensures that the services are loosely coupled. Changes to the database of one service do not impact any other services. For example, if there are multiple services accessing the same database, any schema changes would need to be coordinated amongst all the services, which can otherwise cause additional work and delay the deployment of changes to the entire system.
- Each microservice can use the type of database that is best suited to its business logic. For example, a developer can select RDBMS type data source over NoSQL datasource based on the requirement.
- Better application performance as the datasource schema is designed in accordance with the business needs of the particular microservice. With a single shared database, over a period of time, it can lead to huge tables. This makes data retrieval difficult as you have to join multiple large tables to obtain the required data.
Even though microservices are simple to use, it has some disadvantages too. For example, as each service has its own database, supporting database transactions across multiple services and ensuring data consistency is a challenge.
WSO2 Enterprise Integrator supports data integration capabilities, providing an easy-to-use platform to integrate data stores, create composite data views, and host these as data services. It supports secure and managed data access across various datasource types, data service transactions, and data transformation and validation using a lightweight, developer-friendly, agile development approach. These capabilities are described in detail in this article.
Capabilities for Seamless Data Integration
Support for Various Datasource Types
As enterprises depend heavily on data, storing data in a way that is easily accessible is essential. Data storage and retrieval are usually done automatically through applications. This requires exposing data as a service to be accessed and utilized easily by applications and systems, which includes migrating from one datasource to another, retrieving data from a source, and manipulating them within the application.
Data can be stored in conventional RDBMS data storage systems such as MySQL and Oracle or NoSQL datasources such as Cassandra and MongoDB. Data can even be stored in Excel sheets, Google spreadsheets, RDF, and web pages. WSO2 Enterprise Integrator supports various datasource types including RDBMS, CSV, Excel, ODS, Cassandra, Google Spreadsheets, RDF, and any web page. Furthermore, WSO2 Enterprise Integrator provides the freedom to write a custom datasource where users can define their own datasource implementation to fulfill their business logic.
Transactional Support
A transaction is a sequence of one or more related operations or tasks that are executed as a single operation. A distributed transaction is a transaction to update data on two or more distinct nodes of distributed resources. In a distributed transaction, all participants agree on persistent states across all resources before and after the transaction commits or rollbacks. Any changes to data should be permanent as a result of a transaction commit. If even a single participant fails to perform an operation, the entire transaction could have failed and any changes to data introduced by the transaction will be rolled back.
WSO2 Data Services provide distributed transaction using Java Transaction API (JTA) that enable global level transactions across multiple datasource resources in the JVM. It uses 2PC (two phase commit) for a distributed transaction. A separate transaction manager is used to coordinate the global transaction between all the databases. After receiving the commit message it sends a prepare message to all the connections in the database to check whether all are ready to commit, gives the signal to all, and monitors to check if all perform smoothly. Transactions will be canceled and resumed if at least one participant is not in agreement to proceed.
WSO2 Enterprise Integrator uses a transaction manager that requires a special type of data source - XA data sources. This is provided by XA JDBC drivers and most often the JDBC drivers themselves provide a XADatasource. For each RDBMS type, there is a specific XA-Datasource class and a set of configuration properties. When the XA functionality is used, the transaction manager uses XA resource instances to prepare and coordinate each transaction branch and then commits or rollbacks each individual transaction appropriately.
WSO2 Enterprise Integrator also supports request box feature which can be used to group a set of dataservice operations together and execute them at once. It acts like server-side batch processing. When the request box is invoked, individual service calls will be executed in the specified order and if one operation fails, it rollbacks the entire set of operations.
Transaction support is available in a microservices architecture when each microservice has its own datasource. Decentralized data management in an MSA identifies this as a key aspect when designing microservices. For example, customer service has its own database for storing customer data and the inventory tracking service stores its data in a separate datasource as illustrated in the following figure.
There can be a situation where to fulfill a business logic, two databases which are owned by different microservices have to be queried. For example, if a customer purchases a product, the inventory has to be updated. But at the same time, it should check whether the inventory limit has been exceeded before proceeding to the customer’s order. If customer service keeps a local copy of data of the inventory, there can be a situation where the inventory service is updated by other request calls. Then the data is not consistent across all consuming applications and storing devices, which is a huge challenge when designing data integration within microservices.
In such scenarios, there are other patterns that are introduced with the emerging trend in microservices such as Saga. A Saga is a sequence of local transactions. The Saga pattern involves implementing each business transaction that spans across multiple services as a Saga. Each local transaction updates the database and then triggers an event or sends a message to notify the next local transaction within the Saga.
Availability as REST APIs
Due to its flexibility, speed, and simplicity REST is widely used for data integration. Dataservices can be exposed in WSO2 Enterprise Integrator using either SOAP services or REST services. For a RESTful service, a query can be mapped to a web resource by using the HTTP method in the dataservice. Each CRUD operation (add, update, retrieve, and delete) in a dataservice can be mapped to corresponding HTTP methods.
Even though REST doesn’t define data formats, it’s usually associated with exchanging JSON or XML documents between a client and a server. Data services work with both XML and JSON payload formats and the output will be formatted according to the defined structure. For fast access to data with minimum memory usage, data streaming is supported for both XML and JSON payload types.
Open Data Protocol (OData) is another way of accessing datasources through RESTful APIs. OData is an OASIS standard that defines the best practices for building and consuming RESTful APIs. OData allows users to perform basic CRUD operations on top of a datasource on demand as a REST service by utilizing the different REST methods without intermediate configurations in the deployment time in dataservices. It requires only the datasource configuration in the deployment time and the rest is handled at the run time.
For the microservices architecture to function, each individual microservice must be able to interact. This includes:
- Each individual microservice must be able to communicate with every other microservice in the architecture.
- Microservices should be able to communicate with systems and services which provides data and the databases from which they draw real-time information, essential to their functioning.
- The client applications should be able to communicate with microservices to facilitate the end user.
These communications have to happen fast with low overhead and network latency. Therefore, each microservice must have an interface, which is why the API plays a crucial role in an MSA. This also facilitates scaling capabilities of microservices, as each microservice is loosely coupled and can be replaced and scaled up easily. The following figure demonstrates how the REST APIs are associated in a warehouse.
Batch Processing
Batch data processing is an efficient way of processing high volumes of data where a group of transactions is collected over a period of time. It often processes large volumes of data at the same time, with long periods of latency. This is a very common practice used by organizations especially for tasks such as paying salaries, calculating and printing invoices, and maintaining accounts. WSO2 Enterprise Integrator provides facilities to invoke multiple operations as a batch. Another requirement of batch operations is, if one invocation fails, then all the other operations in the batch should fail.
Data Streaming
Data streaming has become an essential requirement for enterprises seeking to retrieve and process data fast. Nowadays, enterprises are dealing with large volumes of data, at the same time that data should be processed fast. But while retaining resources such as memory, CPU as a constant, processing high volumes of data is a challenge. WSO2 Enterprise Integrator supports data streaming capabilities, where theoretically there is no limit to the data size of a data service response. This provides:
- Memory efficiency - There is no memory build up in the server, as in the case where the full result is stored in memory. The streaming capabilities push data to the client-side as and when needed.
- Low response time - Since the data is returned as soon as it is generated from the server, the response will be instantaneous for the client, and will be able to process the data as it is streamed real-time from the server.
Security
Data is an important asset to any organization and thereby, it is essential to safeguard it from unauthorized access. Therefore organizations are taking proactive measures and controls to make sure only authorized users can access their data.In WSO2 Enterprise Integrator, dataservices supports web services security capabilities out of the box. Web services security or in other words, SOAP message security, identifies and provides protections from general computer security threats and other threats that are unique to web services. It adheres to WS Security, WS-Policy, and WS-Security Policy specifications. These specifications define a behavioral model for web services. Since security needs can vary from service to service, WSO2 Enterprise Integrator supports configuring security policies per service.
It supports predefined, commonly-used security scenarios that can be engaged to the service as a security policy such as username token, non-repudiation, integrity, and confidentiality. Understanding the exact security requirement is the first step in planning to secure web services.
For restful services, dataservices can be integrated with WSO2 API Manager to manage API aspects including controlling unauthorized access. It supports implementing the key attributes of API security, which are authentication, authorization, confidentiality, integrity, availability, and non-repudiation.
Sensitive data such as passwords configured in services can be encrypted using the Cipher tool to ensure unauthorized parties can’t read and access this data.
Furthermore, apart from service level security, data also can be secured and filtered based on the user. WSO2 Enterprise Integrator provides facilities to filter content or data based on roles from the primary user store of the server.Data Transformation and Validation
Sometimes a different data format is required for the output received after service invocation. WSO2 Enterprise Integrator provides capabilities to transform output data into different format seamlessly by using:
- XSLT - The user can define the transformation xslt and configure that in the service. The final response is formatted according to the defined XSLT in the runtime.
- JSON and XML schema validation can be done with the help of a mediation layer in WSO2 Enterprise Integrator.
The user can define validators for input parameters to validate the values in a request and stop the execution of the request if the input doesn’t meet the required criteria. For example, users can configure a validator to validate the input value for the email parameter to make sure that the value is there according to the given format. Otherwise, the user will be notified. This prevents inserting malformed data to the datasource. There are in-built validators which cover widely used use cases. But users have the freedom to extend that, to fulfill their business needs.
Conclusion
In every organization, handling data in an efficient and effective way is an essential part of the success. Therefore, it is necessary to take action to integrate data in a way that matches the organization’s business needs and the nature of the business. Enterprises can adhere to the most widely used software design architectures - centralized monolithic architecture and decentralized microservices architecture. WSO2 Enterprise Integrator supports data integration in both of these architecture patterns in a seamless way.