Introducing WSO2 Streaming Integrator 7.1

  • By Sajith Ravindra
  • 26 Aug, 2020

Executive Summary

Moving data from one location to another is a vital operation in modern enterprise based applications. This is where connecting data sources to respective destinations becomes a part of the business process. This article briefly describes the new capabilities and improvements in WSO2 Streaming Integrator facilitate the aforementioned requirements under the following topics:

  • Streaming ETL task wizard
  • Monitoring dashboard for ETL flows
  • Event replay and error store
  • Extension installer
  • Enhanced Support for CDC, File Streaming, and Cloud Storages

Introduction

The recently introduced WSO2 Streaming Integrator 7.1 enables you to treat all your data sources as events streams. It can listen to data sources that publish events or source events in realtime from static data stores such as DBs and files. The rich stream processing capabilities of Siddhi.o, the engine powering WSO2 Streaming Integrator, can be used to process and analyze data streams using a powerful set of techniques and tools. Finally, processed data can be integrated with destinations using the comprehensive set of connectors written to integrate with various systems and transports.

While WSO2 Streaming Integrator is a general purpose stream processing engine, its capabilities, in terms of connectors, data processing capabilities, and high performance, etc. perfectly match the requirements of streaming based data integration use cases such as Streaming Extract, Transform, Load (ETL). In this context, the new release of WSO2 Streaming Integrator introduces many useful capabilities such as ETL task generation wizard, ETL flow monitoring capabilities, error storing and data reply, etc. to make it an enterprise grade tool that allows you to build, run, and maintain streaming based integrations/ETL task efficiently at a lower cost.

New Features

These are the new features that we have introduced in the latest release of WSO2 Streaming Integrator.

Streaming ETL Task Wizard

Moving data across different systems is a common problem every integration system has to solve. Streaming based ETL is becoming increasingly popular as most use cases now demand data to be available in near real-time. In order to use streaming techniques effectively to build streaming ETL flows, developers need adequate knowledge about data streaming.

However, this can be overwhelming for a non-technical user or even for a user who’s new to data streaming. The newly introduced ETL task wizard for WSO2 Enterprise Integrator hides all this complexity induced by the usage of the streaming, lets users walkthrough 6 self-guided steps, and build end-to-end Streaming ETL flows in a matter of minutes. The users can configure their data sources and destinations and build logics with visual tools to manipulate and process the incoming/outgoing data with ease and having to write “no-code”. And finally, export it as a docker image, or to a Streaming Integrator server or as a K8s artifact.

Monitoring Dashboard for ETL Flows

An organization needs to build and maintain numerous data integration flows. In order to ensure that the data integration flows are functioning as expected, a proper monitoring mechanism is vital. A well designed monitoring system should enable you to identify errors quickly and facilitate further action by pointing to the source of error. Furthermore, it should be possible to get a clear idea about the load and performance characteristics of the system.

The streaming ETL dashboard introduced with this release provides a comprehensive set of stats and figures that will allow users to monitor and understand the system performance, load, and locate eros in their streaming ETL data flows. It provides a view of the system at different levels, starting from stats covering the complete server through to stats specific to a given data source or destination.

Event Replay and Error Store

Every piece of data can be potentially concealing valuable information or carrying critical instructions, thus losing even fractions of data can have a considerable impact. Therefore an enterprise-grade integration system should provide means to ensure “zero data loss” at all levels starting from resilient highly available deployments to error management tools that enable the detection of errors and take corrective actions promptly.

WSO2 Streaming Integrator is designed to process a large number of events per time unit. In the middle of a high rate event flow, it’s possible that some events might result in errors. The “Error store” functionality lets users collect such erroneous events into a store with information that will be useful to rectify the error. And the “event replay” allows users to browse through the errors that are being stored and replay the events with modifications as necessary.

Extension Installer

There are 60+ extensions available for WSO2 Streaming Integrator that lets users connect to various systems, endpoints, and transports. Installing these extensions requires manual work which will make the user download various 3rd party dependencies from different locations. This can be a tedious task and difficult to manage in a production system as a developer must keep track of the dependencies and fetch them when required.

The extension installer utility added to WSO2 Streaming Integrator helps you to handle this complexity with ease. It lets users easily identify the connectors that are installed in the server and the available connectors to be installed. Furthermore, it lets users install extensions along with its dependencies with a single command, eliminating all manual action.

Improvements

In addition to the new features mentioned above, here are some improvements in the latest release.

Enhanced Support for CDC, File Streaming, and Cloud Storages

Traditionally, t data resides in two locations - file systems and databases. Modern enterprises mostly rely on streaming data for analytics purposes and cloud storage for storing data due to easy maintenance. Organizations may need to streamline data in static data sources via several integration flows and finally store it in the cloud.

WSO2 Streaming Integrator provides a variety of extensions to fulfill the above-mentioned requirements. Siddhi-io file and CDC can be used to stream data that resides in the static data sources. Extensions such as siddhi-io s3, GCS, Azure data lake, and cosmos DB can be used to publish and receive events from cloud data sources.

Conclusion

WSO2 Streaming Integrator enables users to connect any data source with any destination by employing streaming-based techniques to fetch, process, and publish data. The comprehensive set of IO extensions of WSO2 Streaming Integrator can be used to connect with various sources and destinations communicating using different data formats. You can easily put these abilities in place to implement data integrations built upon streaming. Moreover, many modern organizations are compelled to use streaming-based techniques such as streaming ETL for data integration as they require data in near real-time, event minutes latencies are considered to be too late. The latest release of WSO2 Streaming Integrator consists of a comprehensive set of features that lets users build streaming-based integrations and ETL tasks easily and maintain and monitor data flow inside-out, as well as allow to identify and mitigate errors occurring within data flows to avoid losing data.

About Author

  • Sajith Ravindra
  • Senior Software Engineer
  • WSO2