Monitor Your Key Performance Indicators using WSO2 Business Activity Monitor
- Kasun Gunathilake
- Senior Software Engineer - WSO2
Applies To
WSO2 BAM | Version 2.0.0 and above |
Table of Content
Following I have shown the BAM architecture which help us to achieve this greater flexibility.
Data that needs to be monitored goes through the above modules and the data flow is as follows.
- The system that you are going to monitor have the BAM data agents and these agents will capture the required data and send it to BAM. In addition to the data agents users can send data to BAM through the REST API.
- Data receiver in BAM receive this data and store them in Cassandra data store.
- Analyzer engine will run the analytics written in hive language according to the monitoring requirement, the summarize data can be persist into RDBMS data store.
- In the presentation layer summarized data will fetch from RDBMS and show it in the dashboard/reports.
I am going to illustrate the capability of defining KPIs by using an use-case. Let's think You have hosted a news website and You need to monitor the traffic comes to your website and analyze it to understand how you can enhance your website and marketing strategies to increase the profit.
- No of unique visitors per day
- Most popular news category
- Number of peoples from different locations
How many visitors does it take for your website to achieve its goals? Your profit will proportional to the no of visitors to your website. If this shows some growth day by day, which means you are making a progress.
This category will depends on several factors (time, incidents, gossip, events etc.). Hence analyzing your traffic and identifying the current in demand news categories are essential for you to promote your contents more in those sources and increase the profits.
This will helpful for identifying the regions that have low no of visitors and take some actions to improve the traffic from these areas.
Now I am going to illustrate step by step on how to use BAM to define above KPIs.
Website traffic information can be captured and published into BAM using following ways.
- Using Java API
- Using client generated by thrift IDL
- Using REST API
BAM provides high performance, low latency, load-balancing (between multiple receivers ), non-blocking and multi-threaded API for sending large amount of business events over various transport (TCP,Http) using Apache Thrift. This API has been provided as a Java SDK and you can use it easily for publishing captured data from your Java based system to BAM for analysis. In addition to that these data-agents are compatible with WSO2 CEP that can be used for real time analysis.
Following I have written a simple asynchronous data-agent for publishing web traffic information to BAM. This will help you to understand the agent API usage. For more information about writing a data publisher for BAM, you can refer this article [1]
import org.apache.log4j.Logger; import org.wso2.carbon.databridge.agent.thrift.Agent; import org.wso2.carbon.databridge.agent.thrift.AsyncDataPublisher; import org.wso2.carbon.databridge.agent.thrift.conf.AgentConfiguration; import org.wso2.carbon.databridge.agent.thrift.exception.AgentException; import org.wso2.carbon.databridge.commons.Event; public class DataAgent { private static Logger logger = Logger.getLogger(DataAgent.class); public static final String ONLINE_NEWS_STATS_DATA_STREAM = "online_news_stats"; public static final String VERSION = "1.0.0"; public static final String DATA = "Kasun,Colombo,Sri Lanka vs Australia 3rd ODI,Sports,1366433587\n" + "Amal,Kaluthara,Businessman killed in Expressway accident,Accidents,1366533595\n" + "Kamal,Colombo,Navy intelligence investigating boat escape,Military,1366633695\n" + "Kalum,Mathara,No leadership change - JVP,Politics,1366433520\n" + "Nuwan,Galle,Marginal improvement at this year’s O/L exam results,Education,1366533987\n"+ "Sampath,Mathara,Marginal improvement at this year’s O/L exam results,Education,1366433987\n"+ "Chamath,Mathara,Attempts to replace Minister Sirisena: UNP,Politics,1366634587\n" + "Prabath,Colombo,WikiLeaks; LTTE could have threatened Karunanidhi,War,1366535587\n" + "Amila,Kandy,Private bus owners to launch one-day strike,Transport,1366732587\n" + "Budhdhika,Colombo,ICC introduces new 'No ball' playing condition,Sports,1366731587\n" + "Rangana,Nuwara Eliya,Annual horse racing event in Nuwara Eliya,Sports,1366534787\n" + "Gamini,Colombo,O/L 2012 results released,Education,1366434987\n" + "Tharindu,Colombo,O/L best results,Education,1366334087\n" + "Janaka,Kaluthara,O/L best results,Education,1366839787\n" + "Anuranga,Jaffna,Sri Lanka vs Australia 3rd ODI,Sports,1366336737\n" + "Kasun,Colombo,Marginal improvement at this year’s O/L exam results,Education,1366338788\n"+ "Ranga,Kandy,Private bus owners to launch one-day strike,Transport,1366832727\n" + "Denis,Kaluthara,ICC introduces new 'No ball' playing condition,Sports,1366534117\n" + "Harsha,Galle,No leadership change - JVP,Politics,1366533333\n"; public static void main(String[] args) { AgentConfiguration agentConfiguration = new AgentConfiguration(); System.setProperty("javax.net.ssl.trustStore", "/opt/wso2bam-2.2.0/repository/resources/security/client-truststore.jks"); System.setProperty("javax.net.ssl.trustStorePassword", "wso2carbon"); Agent agent = new Agent(agentConfiguration); //Using Asynchronous data publisher AsyncDataPublisher asyncDataPublisher = new AsyncDataPublisher("tcp://127.0.0.1:7611", "admin", "admin", agent); String streamDefinition = "{" + " 'name':'" + ONLINE_NEWS_STATS_DATA_STREAM + "'," + " 'version':'" + VERSION + "'," + " 'nickName': 'News stats'," + " 'description': 'Stats of the news web site'," + " 'metaData':[" + " {'name':'publisherIP','type':'STRING'}" + " ]," + " 'payloadData':[" + " {'name':'userName','type':'STRING'}," + " {'name':'region','type':'STRING'}," + " {'name':'news','type':'STRING'}," + " {'name':'tag','type':'STRING'}," + " {'name':'timestamp','type':'LONG'}" + " ]" + "}"; asyncDataPublisher.addStreamDefinition(streamDefinition, ONLINE_NEWS_STATS_DATA_STREAM, VERSION); publishEvents(asyncDataPublisher); } private static void publishEvents(AsyncDataPublisher dataPublisher) { String[] dataRow = DATA.split("\n"); for (String row : dataRow) { String[] data = row.split(","); Object[] payload = new Object[]{data[0], data[1], data[2], data[3], Long.parseLong(data[4])}; Event event = eventObject(null, new Object[]{"10.100.3.173"}, payload); try { dataPublisher.publish(ONLINE_NEWS_STATS_DATA_STREAM, VERSION, event); } catch (AgentException e) { logger.error("Failed to publish event", e); } } } private static Event eventObject(Object[] correlationData, Object[] metaData, Object[] payLoadData) { Event event = new Event(); event.setCorrelationData(correlationData); event.setMetaData(metaData); event.setPayloadData(payLoadData); return event; } }
Thrift IDL can be used to generate thrift clients from different languages to publish data. In this case asynchronous data publishing implementations will be to developed by the developers the way similar to how it has been implemented in Java API.
Languages are supported by thrift
- C++
- C#
- Cocoa
- D
- Delphi
- Erlang
- Haskell
- Java
- OCaml
- Perl
- PHP
- Python
- Ruby
- Smalltalk
You can capture the website traffic information from your system and use BAM REST API for publishing those data to BAM via Http transport.
Data send from above data agents will receive by the data receiver which use thrift and internal optimization techniques to achieve very high throughput. Those received data will directly write to Cassandra which is high performance and scalable big data storage. It persists events into a column family with name equal to the stream name. Also BAM data receiver can be used share the events with WSO2 CEP for real time analysis.
It provides graphical user interface for viewing the data in a Cassandra column family.
Now let's see the data published from our sample client.
Go to Home → Manage → Cassandra Explorer → Connect to Cluster
Type the connection details as below.
You can see all the keyspaces and column families are listed in Cassandra explorer ui.
Then go to online_news_stats in the EVENT_KS.
You can view the data by going to the column family.
After successfully publish the data we can do the analysis.
You can run data analytics on captured data by using the BAM analytics framework which is powered by Apache Hadoop for scaling out data processing operations on a large number of data processing nodes, in order to handle large data volumes. WSO2 BAM provides an easy way to do the map/reduce operation on Hadoop by using Apache Hive that provides a simple way to write query and managing large datasets residing in distributed storage using a SQL-like language called HiveQL.
Following I have list down the KPIs that we are going to analyze.
- No of unique visitors per day
- Most popular news category
- No of peoples from different regions
This is the Hive script to analyze the above KPIs.
CREATE EXTERNAL TABLE IF NOT EXISTS OnlineNewsStats (key STRING, userName STRING,region STRING, news STRING, tag STRING, payload_timestamp BIGINT) STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH SERDEPROPERTIES ( "cassandra.host" = "127.0.0.1", "cassandra.port" = "9160","cassandra.ks.name" = "EVENT_KS", "cassandra.ks.username" = "admin","cassandra.ks.password" = "admin", "cassandra.cf.name" = "online_news_stats", "cassandra.columns.mapping" = ":key,payload_userName,payload_region,payload_news,payload_tag,payload_timestamp" ); CREATE EXTERNAL TABLE IF NOT EXISTS UniqueVisitorsPerDay(count INT, day STRING) STORED BY 'org.wso2.carbon.hadoop.hive.jdbc.storage.JDBCStorageHandler' TBLPROPERTIES ( 'wso2.carbon.datasource.name'='WSO2BAM_DATASOURCE', 'hive.jdbc.update.on.duplicate' = 'true', 'hive.jdbc.primary.key.fields' = 'day', 'hive.jdbc.table.create.query' = 'CREATE TABLE UNIQUE_VISITORS_PER_DAY ( count INT, day VARCHAR(100))' ); insert overwrite table UniqueVisitorsPerDay select count ( distinct username), from_unixtime(payload_timestamp,'yyyy-MM-dd') as day from OnlineNewsStats group by from_unixtime(payload_timestamp,'yyyy-MM-dd' ); CREATE EXTERNAL TABLE IF NOT EXISTS PopularNewsCategory(count INT, category STRING) STORED BY 'org.wso2.carbon.hadoop.hive.jdbc.storage.JDBCStorageHandler' TBLPROPERTIES ( 'wso2.carbon.datasource.name'='WSO2BAM_DATASOURCE', 'hive.jdbc.update.on.duplicate' = 'true', 'hive.jdbc.primary.key.fields' = 'category', 'hive.jdbc.table.create.query' = 'CREATE TABLE POPULAR_NEWS_CATEGORY ( count INT, category VARCHAR(100))' ); insert overwrite table PopularNewsCategory select count(tag), tag from OnlineNewsStats group by tag; CREATE EXTERNAL TABLE IF NOT EXISTS PeoplesFromDifferentLocations(count INT, region STRING) STORED BY 'org.wso2.carbon.hadoop.hive.jdbc.storage.JDBCStorageHandler' TBLPROPERTIES ( 'wso2.carbon.datasource.name'='WSO2BAM_DATASOURCE', 'hive.jdbc.update.on.duplicate' = 'true', 'hive.jdbc.primary.key.fields' = 'region', 'hive.jdbc.table.create.query' = 'CREATE TABLE PEOPLES_FROM_DIFFERENT_LOCATIONS ( count INT, region VARCHAR(100))' ); insert overwrite table PeoplesFromDifferentLocations select count(region),region from OnlineNewsStats group by region;
RDBMS data storage can be configured from master-datasources.xml in WSO2BAM/repository/conf/datasources/ directory. You need to change the following section
<datasource> <name>WSO2BAM_DATASOURCE</name> <description>The datasource used for analyzer data</description> <definition type="RDBMS"> <configuration> <url>jdbc:h2:repository/database/samples/BAM_STATS_DB;AUTO_SERVER=TRUE</url> <username>wso2carbon</username> <password>wso2carbon</password> <driverClassName>org.h2.Driver</driverClassName> <maxActive>50</maxActive> <maxWait>60000</maxWait> <testOnBorrow>true</testOnBorrow> <validationQuery>SELECT 1</validationQuery> <validationInterval>30000</validationInterval> </configuration> </definition> </datasource>
From BAM 2.3.0 onwards you can use Cassandra datasource in hive script and it can be configured from master-datasources.xml.
Once you run the above script hive read data from the Cassandra storage and summarize it, then the summarized data will persist into RDBMS storage to visualize via dashboard.
After you analyze and summarize data according to your KPIs, you need to visualize your kpis. Since those summarized data in the RDBMS storage you can plug third party visualization tools for the visualization. Here you can find an article about using different reporting frameworks with WSO2 Business Activity Monitor [2]
Also you can use WSO2 Jaggery framework[3] or WSO2 BAM gadget generation tool to visualize your KPIs.
Here I have used gadget generation tool for generating and deploying gadgets in dashboard. You can find information on how to use it from here [4]
Dashboard
Number of Visitors Per Day
Popular News Category
Visitors Distribution
References
- https://wso2.org/library/articles/2012/07/creating-custom-agents-publish-events-bamcep
- https://wso2.org/library/articles/2012/09/using-different-reporting-frameworks-wso2-business-activity-monitor
- https://jaggeryjs.org/
- https://docs.wso2.org/wiki/display/BAM220/Gadget+Generation+Tool
Author
Kasun Weranga Gunathilake, Senior Software Engineer, WSO2 Inc.