Table of Contents
Let’s assume John is the CEO of a large enterprise; as with any modern enterprise, John’s organization generates a significant amount of data that could possibly translate to useful information about the business, which may not be apparent. Converting this data into useful information will help John’s company to be more productive and to compete better. To enable this, John hires Mark, a data scientist. Mark’s primary role is to collect, clean, and analyze this data and present these to John and other stakeholders to help them make better business decisions.
Data can be produced from multiple sources in varied formats. Mark has to capture this data and store them in a single place and prepare these for analysis. Mark can use batch analytics to combine, compare, and contrast different data sets with each other and produce summarized data to extract relevant information that will be useful to the CEO. Mark will also need to ensure that John and his colleagues are up to date on live events and current trends in the market so they could take timely and proactive business decisions. For this, Mark would need to analyze incoming data; when these are available, he can use real-time analytics to send e-mails or an SMS to instantly notify John or provide updates on a dashboard so this information can be accessed on a mobile device or laptop.
Mark will also need to extract certain information on a regular basis and this would need to be presented as interactive visuals to John and other stakeholders so they can get better insights into the data by interactively drilling these down. John’s organization can stay one step ahead if they can foresee the future. To enable this, Mark can leverage predictive analysis; he can use current and historical data to create and finetune machine learning models for this purpose.
To understand what Mark is producing by analyzing data John needs dashboards. A dashboard will provide John with a concise view of the business by showing performance, real-time alerts, and forecasts. John will need to access this information from anywhere at anytime.
The WSO2 analytics platform is an ideal solution that would meet all of John’s enterprise requirements. WSO2’s open source analytics platform is seamlessly integrated, combining batch, interactive, real-time and predictive analytics. It allows an organization to
- Collect data from multiple sources in different formats
- Analyze data using various techniques
- Communicate results effectively
The analytics platform allows users to extract data from any data source that could be in any format. It also has a pluggable datastore architecture to allow organizations to use datastores that match their needs.
The batch analytics capabilities of the WSO2 analytics platform enable the use of different operations and techniques to summarize and aggregate data that was collected over a period of time in order to derive a broader view of the data. Moreover, it has the capability to analyze data on the move in real-time. Real-time analytics enable to correlate data from multiple sources to detect patterns, anomalies, etc. and generate alerts or visualize results through views, such as dashboards. In real-time analytics, users are interested in the results as and when they become available, e.g. traffic monitoring, smart order routing, compliance monitoring, and fraud detection.
In addition to generating instant updates in real-time and analyzing data in batch mode, it will be useful to be able to predict what the future holds based on past performance. The WSO2 analytics platform comes with predictive analytics capabilities that enable an organization to identify problems based on past data and make predictions about the future. The results produced by real-time and batch analytics can be fed to create machine learning models and fine-tuned to predict future events, e.g. trying to predict the next value of a stream of events with sensor readings by learning from past values or trying to detect if a given e-mail is spam or not by learning from past classifications. It should be noted, however, that this mode of analytics needs expertise on creating correct models and for tuning algorithms for accurate and efficient predictions.
Another form of analytics offered by the WSO2 analytics platform is interactive analytics where users can ‘interactively’ analyze data that was collated and processed over a period of time using quarries. In a typical interactive analytics scenario, users will first need to see the data in context and then drill down into details to get a better understanding of the situation, e.g. detecting an anomaly in a series of credit card transactions using real-time analytics or by looking at dashboards and then using interactive analytics to dig deeper and verifying if it’s an act of fraud; this can be done by obtaining information like other transactions carried out around the same time, historical transactions, etc.
The WSO2 analytics platform also provides capabilities for visualizing analyzed information in multiple ways. The results can be published onto dashboards (examples shown in Figure 1) that contain various types of gadgets. Here, the interest of the stakeholders will be to monitor key performance indicators (KPIs). Some examples of KPIs include the number of unique visitors to a website on a given day, the number of customers who make it through to a purchase in a sales funnel, and per user data utilization on a network. Monitoring KPIs could be done in both real-time and in batch mode. In the case of real time, there needs to be an alerting model, hence dashboards may not be the way to go.
With visualization, users can make decisions based on what they see on the screens, which again is based on the analyzed data. Those decisions lead to actions, such as changing process parameters and fine-tuning the process. Some of this action could also be automated to some extent.
The WSO2 analytics platform (illustrated in Figure 2) supports the key stages of analytics requirements: data collection, data analysis, and communication. First, a user needs to define data streams to describe the data. Thereafter, the user can write SQL-like queries using Spark SQL and Siddhi Event Query Language to analyze streams being defined when publishing events or/and use machine learning models to make forecasts. Finally, the outputs can be communicated to the end user as alerts, visualizations on dashboards, or as APIs so users can obtain data.
Data collection can be done from any data source in various protocols. The WSO2 platform defines the concept of ‘data agents’ to collect data as shown in Figure 3.
For WSO2 products, such as WSO2 API Manager and WSO2 Enterprise Service Bus (ESB), there are pre-built data agents that publish information, such as statistics for service monitoring, usage monitoring, and message mediation monitoring.
For your own custom data sources you can implement custom agents with ease using the APIs provided. Moreover, you can use ESB with its 150+ connectors in conjunction with a business activity monitor mediator to collect data feeds from custom sources like Twitter, Facebook, etc.
The data agents publish streams of data into WSO2 Data Analytics Server (DAS). WSO2 DAS is capable of capturing these data streams into data storages and then analyzes these stored data using the analyzing engine in batch mode. A simple spark query is shown below:
WSO2 DAS can act on the incoming data streams as and when the data is received (in real time) without storing. In addition, WSO2 DAS allows users to drill down into collected data interactively using queries. A sample Siddhi query to recognize a pattern in a stream of events is shown below:
Furthermore, WSO2 Machine Learner (ML) can be used to build and fine tune machine learning models by feeding collected data. The ML wizard helps you to create models that you can use to classify data and then these models can be run in WSO2 DAS for predictive analysis.
WSO2 DAS can generate result streams and store the results and summarized data in an RDBMS database. Users can listen to these results streams or obtain data from the database via the provided API to act on the results.
Summarized data and results data can be used from visualization tools, such as WSO2 Dashboard Server (DS), to build dashboards to monitor KPIs. Moreover, WSO2 DAS itself has a feature to create dashboards and gadgets. In the case of real-time analytics with WSO2 DAS, in addition to KPI dashboards, what is more interesting is generating alerts for matching event detections, such as sending emails or SMSs. However, WSO2 DAS result streams too can be recorded into storage to help achieve delayed processing rather than real-time monitoring.
WSO2 products, such as WSO2 ESB, WSO2 API Manager, and WSO2 Identity Server have built-in data agents to publish its data to WSO2 DAS. This data is analyzed using WSO2’s batch, real-time, predictive, and interactive analytics capabilities to provide valuable insights into the usage of those servers.
In an API management solution, analytics plays a vital role as there are many instances that need to be considered to maintain high availability and security of APIs. Some of the key areas in API analytics are API health monitoring, API usage monitoring, and suspicious activity monitoring. For example, if an API starts to fail suddenly or the origin IPs of requests are changed or the pattern of API resource usages change abnormally, the administrators should be alerted so they can take prompt action to handle possible failures/threats.
The real-time and batch analytics capabilities of the WSO2 analytics platform (Figure 4) are leveraged to monitor the status of each API in order to generate customizable alerts on conditions that require attention. The alerts will be communicated to responsible parties as e-mails/SMSs as well as displayed in a dashboard so it can be monitored by the system administrator.
Security analytics deal with the application of big data analytics on all data related to identity security to provide meaningful information; this data will help security admins to further optimize their identity platforms and detect any fraudulent activity at an early stage.
This ranges from basic statistics and graphs generated through real-time queries and batch analytics (Figure 5) that summarize session usage, login attempt evolution, usage of different identity providers to login to different service providers, etc. to much more complex and investigative analytics, such as the ability to identify a security breach or unauthorized access of resources using correlations and pattern detection. The WSO2 analytics platform is efficiently used to provide vivid views of identity-related analytics to provide the ability to proactively handle security breaches and to ensure optimal resource utilization and maintenance of the identity platform.
The WSO2 analytics platform offers customizable IoT device analytics that include predictive analytics using machine learning capabilities. It supports edge computing devices and policy-based edge analytics as well as pre-built instant visualization for sensor readings using live data streams gathered from devices.
In a typical IoT scenario, a device will send events containing timestamp, location/ proximity data and some readings (e.g. temperature, power, etc.). In general, with this data, we can monitor each device as a single unit as well as look at the devices as part of a large system.
IoT analytics (Figure 6) provide the capability to look into the details of each device for information, such as active/inactive status and last update time. The system administrators can have a comprehensive and concise view on all devices; information like the number of different device types, how many devices from each device types are connected, policy compliance ratio of devices, etc. are shown via a dashboard to enable efficient management of the system. A geo dashboard is provided to monitor the locations of devices connected to the system, which will give a clear view of the dynamics of the connected devices.
Enterprise integration scenarios involve various message flows that can be long and complex. Therefore, ad hoc tuning might not be sufficient to find bottlenecks, troubleshoot, and to achieve optimal performance. Enterprise integration analytics allow users to monitor statistics to identify hotspots in the message flow and to fine tune configurations. In addition, it allows the user to obtain overall statistics on the flows to monitor performance.
The users can trace messages through the mediation flow (Figure 7) and find what the message content was in each mediator. A dashboard is provided with useful mediation statistics like processing time per mediator, response/request time, etc. Furthermore, the user will be provided a view to show which parts of the mediation flow are incurred most times so the users can click on the necessary area, drill down, and get a historic view of the performance of that section. From thereon, the user can drill down to find a specific list of problematic messages and ultimately view a few of these to inspect its contents.
WSO2 DAS is a self-contained product that can be used to perform real-time, batch, and interactive analytics. You can run machine learning models created using WSO2 ML inside DAS to perform predictive analytics as well. In addition, dashboards can be created using DAS to visualize the analyzed data.
Before initiating an analysis, data has to be sent to WSO2 DAS. If you’re using WSO2 products, such as WSO2 ESB and WSO2 API Manager, they have data agents built in to publish data to DAS. All you need to do is configure the WSO2 server to point to your DAS instance. If it’s required to publish events from your own custom data source you can easily write a data agent using a well-defined API. You can also use WSO2 ESB with ESB connectors to push data streams from data sources like Twitter and Facebook. Various protocols, such as Apache Thrift, HTTP, JMS, and MQTT, are supported by DAS to receive/ publish data.
When sending data, first it’s required to define how your data streams looks like. You can define your data model as ‘Stream Definitions’. A stream definition defines the set of fields with the data types to describe the structure of messages received/sent via a stream.
Once the data streams are defined you can write ‘execution plans’ using Siddhi query language to analyze your data stream in real-time and results can be pushed to a result stream. The result stream can be communicated out of DAS as alerts using e-mail or SMS, or just simply sent out as an event using various protocols. Then you can choose to persist the incoming data streams and use Apache Spark queries to analyze data in batch mode.Spark scripts can be scheduled to run in regular intervals or triggered manually and the summarized data can be written to an RDBMS database. You can also use the interactive console to execute Spark quarries against the data too.
The persisted data stream can be interactively analyzed and drilled down via the ‘data explorer’ UI provided in DAS. You can define different search criteria as queries or define time span to obtain data for your analysis. The ‘activity explorer’ provides the capability to correlate data across different streams received in a given time period.
WSO2 DAS also provides the capability to create dashboards. You can create gadgets with different chart types to visualize streams defined in DAS and combine them as required to create a dashboard. DAS also has industry/domain-specific toolboxes and extensions to support business use cases, such as fraud detection and GIS data monitoring.
DAS can be deployed very easily as a single instance to start analyzing your system initially. Its highly scalable design allows you to easily scale the analytics solutions you build on top of DAS as your system grows. Therefore, you can start analyzing your system with minimal resources and less effort, and scale up gradually as your system and requirements evolve.
The WSO2 analytics platform is a comprehensive platform that’s built from ground up to meet the needs of big data analytics. It addresses all components of an analytics solution required for an enterprise - i.e. data collection, analytics, and communication (Figure 8).
The platform comprises all the pieces required to achieve comprehensive analytics from end to end. It can monitor both WSO2 platform products, and third-party products and systems using an agent-based architecture. It has easy-to-use tools to access the volumes of data that is collected and provides real-time, batch, interactive, and predictive analytics capabilities with WSO2 DAS and WSO2 ML. It has a complete toolset to build gadgets and dashboards for visualizing results. In addition, it can be easily integrated with WSO2 ESB and WSO2 Business Process Server to take further action based on results that might require human intervention.
The platform is easy to install and get started. Comprehensive documentation on the features and a good set of samples are available for reference. WSO2 DAS and WSO2 ML products are self-contained with all the required tools and technologies, such as analysis engines (e.g. Spark and Lucene, Siddhi). Therefore, it is simple to get going initially with embedded analysis tools when you start your analytics projects.
With the WSO2 analytics platform, you can start small and expand and grow at your own pace. Since both WSO2 DAS and WSO2 ML are self-contained for the requirements of analytics (collection, analysis and communication), you can use a single instance of these products when you start your projects for both proof of concept phases as well as initial production phases. As your analytics platform requirements expand and your data volume and analysis requirements grow, you can gradually expand the scale of the platform. For example, you can first scale up the data storage, then the analysis engine, etc. This enables the enterprise to embrace and initiate analytics solutions with smaller budgets without having to make heavy investments and expand the projects as deemed necessary, proven by return of investments of the initial phases of the projects.
The WSO2 analytics platform allows you to integrate any existing platform that you already have in your enterprise. You can use the custom data agents to publish events/data you want and receive them via custom data receivers. You can easily use WSO2 ESB with 150+ out of the box connectors to get data feeds from sources, such as Twitter and Facebook. Moreover, once you have the analytics platform in place, you can plug in any new systems that you introduce into the enterprise using the similar technique of implementing a custom data agent/receiver pair. Plug-in cost is low and the techniques are simple to implement thanks to the well-defined APIs, comprehensive samples in multiple domains, and documentation available.
The analytics platform is flexible and easy to adapt to your needs. Rather than you having to mold the kinds and forms of data into the product requirements, you can use any data schema you want and get the platform to capture and monitor those data. This is also true for data analysis and visualization. For data analysis, you have the flexibility of defining the analysis logic to suite your needs. At the same time, the platform helps you to schedule and let you deal with the time dimension of the analysis with ease. In case of visualizing, you have the flexibility of writing your own gadgets and laying them out the way you want on your dashboards. Alternatively, the platform also has provision for you to automatically generate gadgets with graphs and tables based on the summarized data streams.
The WSO2 analytics platform is built from ground up with a clean design with no acquisitions integrated. The key advantage of this is that the platform is tuned to ideally fit your needs in the analytics space. In addition, WSO2 keeps fine-tuning and evolving this platform to meet the state-of-the-art trends in this domain.
Today, the analytics needs of an enterprise is often demanding and sometimes complex. With data produced from multiple sources in varied formats the primary requirement is to capture all of this valuable data and store them in a single place to prepare for analysis. Thereafter, batch analytics can be used to combine, compare, and contrast different data sets with each other and produce summarized data to extract relevant information that will be useful to management. It doesn’t just end there - management would also need to stay on top of developments in market and current trends to make proactive business decisions.
To enable this, the enterprise would need to analyze incoming data and use real-time analytics to send alerts or provide instant updates to dashboards that can be accessed on any device. Furthermore, to remain competitive and a step ahead, the enterprise can use predictive analysis where current and historical data is used to make predictions.
The WSO2 analytics platform is an ideal solution that would meet an enterprise’s requirements; the completely open-source platform is seamlessly integrated, combining batch, interactive, real-time and predictive analytics. These capabilities enable an organization to collect data from multiple sources in different formats, analyze data using various techniques, and communicate results effectively.