Lean approach to MDM using WSO2 Middleware Platform
- Asanka Abeysinghe
- CTO - WSO2
Today, MDM is well-defined and mapped into component architecture by considering the technical and business requirements. This article explains how to map the WSO2 middleware products into the MDM reference architecture and implement solutions for various MDM-related requirements.
Table of Contents
- Table of Contents
- Master Data Management
- MDM Reference Architecture
- MDM with WSO2 Stack
- Why a Solution instead of a MDM Product
Every data center is full with data; every application is reading and updating data; every business decision is taken based on data. All these channels generate a new set of data or update existing data. Master data is part of these large data repositories.
Let’s take a look at different types of data we deal with everyday.
- Master data
- Reference data
- Transaction data
- Historical data
Master data represent the core entities of a data representation and all other data related to master data, directly or indirectly. We can take Customer, Account, Employee, User, Item as examples of the core entities in different data domains. When applications create a new set of data, it always starts by creating a master data entry. Most of the data queries are based on mater data and master data are shared and referred by different business units and applications.
To explain the other types of data, we will take Customer as a master data entity.
Reference data provides common values to tag other data types. Reference data tables call as look-up tables used to fill dropdowns in user interfaces, dynamic constants type tables inside the codes and values to use for validations in backend systems. For example, customer type, account type, age ranges represent reference data.
Metadata is used to further describe the master data entries. Usage of metadata for a specific entity will differs from application to application as well as system user. As an example, the ‘customer’ master data entity has a field called address. Metadata related to that field provides the URL for Google maps to describe that address, and shows the default store or the bank this particular customer usually visits.
Transaction data capture data on the day-to-day operation related to a master data entity. Customer’s bank transactions, customer invoices, doctor’s visits of a patient can be taken as example transaction data. Transaction data does not get updated or deleted; they maintain a state and kept in the system for auditing purposes and use the latest as well as hold a valid state as the current transaction.
Historical data is created in multiple ways. Transaction data gets archived and moved as historical data to free space in the data repositories as well as to make the data queries efficient. Updates done to master data are also moved to historical data for future reference. For example, if the customer changes his address, historical data keeps the old address for future reference as well as for reconciliation of other historical data. Historical data is key for data analytics to find different patterns stored in a data repository.
We now have a better understanding about master data and other data types stored in our data storages. In a nutshell, master data is the primary entities in a business domain and is required in the sub-systems used in the business internally; customers and partners externally. Having said that, managing master data becomes a critical factor for the business because, without these core data entities, it is difficult to operate your business.
The Challenge of MDM arises when the enterprise contains multiple master data repositories that represent the same entity, when master data is required in different formats (data models) and when master data is accepted by the sub-systems in different formats.
Another common challenge is having to update data coming from upstream across multiple master data repositories.
MDM wasn’t a big challenge in earlier days since most enterprises worked on batch mode. However, today’s businesses can’t survive without real-time or near real-time operations. Today’s enterprises cannot provide the business functionality without data redundancy as well as data aggregation of master data repositories.
Above diagram shows a standard MDM architecture identified by data architects. I’m taking this as the reference architecture because it fulfills the MDM needs required for an enterprise.
Let’s look at this architecture in detail. MDM repository at the bottom represents the heterogeneous master data repositories we find in data centers that store master data. It can be RDSs (Relational Data Stores), file-based storages or modern NoSQL based data stores.
MDM interface services at the top provide the Data APIs for the consumer applications that execute the CRUD (Create/Read/Update/Delete) operations using the Data API. Data API extends the functionality by providing security, reliability as well as transaction management for the consumer data needs. Applications consume the master data required to use the Data API, instead of directly accessing the data using data access protocols like JDBC and ADO. However, Data APIs lock direct access to the MDRs.
MDM Life-cycle services maintain the business rules required to manage the MDRs. CRUD operations executed by the Data API first go through the business rules and then are transformed to the MDRs low-level data access protocols. Life-cycle management executes automated workflows to update multiple MDRs and it handles the data aggregation or the mash-up of MDRs. Life-Cycle makes the Data API lightweight by abstracting the rules behind the API.
MDM quality layer ensures that the upstream data match the internal data representation of MDRs. Also, data provide external datamodels through the API construct. MDM quality is triggered by the life-cycle layer as part of the automated workflows.
MDM security layer builds and controls the data stored in the MDRs. Master data is exposed through the Data API for many consumers and that creates a big security requirement. The MDM security layer applies the required security policies to the data. Row-level security policies are applied by using the data queries and column-level policies are built by leveraging the MDR system-level security control or by building an application-level security layer in top of the low-level data access protocols. Modern security standards like XACML are an ideal solution for column-level access control.
The MDM events layer is used to communicate across the MDM system by triggering various API calls. For example when it require to update multiple MDRs, events fired after successful MDR updates will help the life-cycle layer to notify and update the other MDRs and handle transaction across these multiple data storages.
The MDM Governance layer creates the governance framework required for the data. It encapsulates life-cycle management, data quality and security layers and provides a single layer for the DBAs to manage master data. At the same time, the governance layer provides a repository to manage the policies associated with security, data quality and life-cycles.
Let’s look at how we put this theory into practice;
Each layer of the MDM reference architecture is filled by a product from WSO2 middleware platform. Looking at the above diagram from top, the WSO2 ESB exposes the Data API by using the mediation capabilities of the WSO2 ESB. API can expose in a RESTful manner or using SOAP on top of HTTP. The WSO2 ESB provides multiple transport support that can be used to bind the Data API with a desired binding for consumers like JMS, Thrift, File etc.
WSO2 Data Services Server build the life-cycle management of the data by executing the CRUD operations. The ESB layer in the API handles simple workflows. When it comes to complex flows, the WSO2 BPS can be installed as a feature inside the Data services server and build the automated flows using BPEL.
Message validation and transformation capability of WSO2 ESB build the data quality related functionality.
WSO2 Identity Server ensures effective security management with core security implementations built across the products the WSO2 Carbon framework itself.
WSO2 Message Broker provides the event capabilities to the solution and links the other components using publish and subscribe message exchange pattern.
Functionality related to governance is covered using WSO2 Governance Registry and Business activity monitor.
This article is not planning to provide detail product information. For that, please refer the relevant product pages at: https://wso2.com/products/
We can find many MDM products in the market but we are talking about a MDM solution here. The advantage of the solution is to maintain a lean architecture when solving business and technical problems. MDM varies from enterprise to enterprise. Each enterprise does not require implementing all the layers described in the reference architecture to solve their MDM problem. Architects can look at the MDM requirements and pick the related layers from the reference architecture and implement only the required layers. This will cut down the implementation of a heavy MDM stack as well as implementation time and make the maintenance simple for the technical-operational groups.The solution we discuss here provides the data architects to build the optimal solution using a minimal set of components. It leads to a simple and lean deployment architecture.
Every enterprise contains master data. Regardless of the latest architecture patterns like Cloud, EDA, SOA, master data is required to fulfill your business functionality. Therefore, master data management plays a vital role in your business operations.
A good MDM system boosts efficiency in business functionality in a consumer-driven manner, with real-time data exchanges to fit to today’s business needs. This article talks about a lean approach to master data management by applying simple architecture principles to build an optimal MDM solution.
Asanka Abeysinghe, Director- Solutions Architecture, WSO2, asankaa AT wso2 DOT com