choreo
2024/02/27
 
27 Feb, 2024 | 3 min read

Automating ETL Tasks Effectively with Choreo

  • CHATHURA KULASINGHE
  • Senior Lead - Solutions Engineer - WSO2

The Context

Connecting multiple systems and exchanging data among them is afrequent requirement in many business scenarios. This typically involves one or many source systems, an intermediary processor, and one or many destination systems. Some organizations invest in purpose-built solution suites such as Data Warehouse, Master Data Management (MDM), or Extract, Transform, Load (ETL) platforms, which, in-theory, cover a wider spectrum of requirements. For project teams with specific requirements and goals, these platforms can represent a significant commitment as they have to invest time and effort in learning both relevant and irrelevant features and capabilities of such platforms. Mastering these platforms typically demands deep involvement, both in learning and in ongoing use. This specialization rarely appeals to application developers. As a result, there's a growing preference within organizations for solutions that are more streamlined, transparent, agile, scalable, and easy to maintain. These solutions focus solely on what is essential and relevant to their specific use cases. As a result, concepts such as domain-driven-design, microservice-architecture, and agility along with technologies that support these concepts keep gaining more traction. 

ETL, once seen as a standalone domain, has now evolved into an integral part of application architecture and development. It's no longer confined to traditional silos but seamlessly integrates into the broader landscape of designing and building modern applications. This shift positions ETL as a crucial element deeply embedded within application architecture, highlighting its essential role in data management and software ecosystems.

WSO2 Choreo

The use-case presented here mainly includes data extraction, transformation, and loading action-flows similar to a typical ETL implementation. Let’s explore how Choreo can help to implement this as a simple, leaner solution while allowing the development teams to stick to their regular practices and methodologies.

Choreo is a comprehensive internal developer platform embracing a wide array of technologies and development cultures. Choreo facilitates your developer organizations with complete development lifecycle management for your applications by simply connecting the source-code repositories to it. It also incorporates new technologies, including WSO2's API Management and Integration tools, which benefit from 18 years of domain expertise. Among these is Ballerina, a modern programming language designed specifically for integration. This technology was utilized to address the requirements of this use-case.

The Use-Case

The business entity discussed in this article is a restaurant chain that uses a third-party, cloud-based electronic point-of-sale (ePoS) platform for day-to-day restaurant operations. They have more than 80 restaurant locations that use many ePoS terminals provided by this ePoS platform. The ePoS platform acts as a central hub, storing Customer, Table, and Server IDs for orders placed across any restaurant. It also offers an API for retrieving this data as needed.

The restaurant chain expects to visualise this information in a human-readable format with the intention of providing meaningful insights to their corporate staff on analytics dashboards. Moreover, the company wants to include other complex data points, such as kitchen timings, billing details, payment entries, and payment methods, further complicating the process. Despite having a well-defined API and expected outcome, the sheer volume of transactions and the multitude of different API calls required to construct a single transaction introduced significant complexity.


Figure 1: The requirement

Solution - Choreo for ETL

Choreo offers three key integration component types, tailored to different execution needs.

  1. Scheduled Trigger
  2. Manual Trigger
  3. Service

In this particular use-case, we have used all three of these component types in combination to execute the flows of actions detailed below.


Figure 2: The solution implemented with Choreo

Component 1: Data Extraction Service

As the very first action, it is required to:

  1. Extract data from the given PoS platform
  2. Store them in a database

To achieve this, we have developed a Service using the Ballerina programming language. Once this service is invoked (similar to invoking any other web service or API), it communicates with the ePoS system to extract the sales related data and then stores those in the analytics database. Although this consists of all necessary logic to extract data from the ePoS system and store those in the given database, it still requires a secondary process or a person to invoke it. Therefore, we have developed 2 other components mainly for this purpose.

Component 2: Scheduled Trigger - (invocation of the ePoS integration flow)

This scheduled task runs at set intervals, automatically triggering the service component described earlier. But, when the system is auto-scaled, multiple instances of the service could create duplicate records in theory. Therefore, this Scheduled Trigger also serves another crucial orchestration task in addition to preventing duplicate records.

Component 3: Manual Trigger

This seldom used component lets admins execute the integration flow manually. This means they can access real-time data on the dashboard anytime, not just during scheduled runs, ensuring up-to-date information.

Component 4: Analytics Service

Finally, the Analytics Service component acts as an API that exposes analytics data from the database to the dashboard. This simply receives requests from the dashboard application, reads data from the analytics database, processes data to create an http response, and sends those responses to the dashboard application.

Summary

The restaurant chain's scenario illustrates the demanding nature of consolidating data from a cloud-based ePoS platform into actionable insights, where Choreo's streamlined approach could offer a viable solution, ensuring transparency, scalability, and maintainability without compromising on functionality.

The complexity of inter-system data exchange often leads organizations to invest in comprehensive solution suites. However, these can still pose challenges for project teams aiming for efficiency and agility. Choreo simplifies data integration, offering a nimble and flexible way to handle complex extraction, transformation, and loading tasks. By seamlessly integrating with existing development practices and tools, our platform empowers teams to focus on their core competencies, while leveraging innovative technologies like Ballerina for precise, standards-compliant integration solutions. 

English