How you can Increase Agility and Expandability with Event Driven Architecture (EDA)

From ordering your favorite kind of pizza or a taxi to manufacturing and financial processes, everything is event driven today. People expect to do everything immediately, get instant feedback on the status of their request, and interact in real-time with anybody involved in the process.

John Mathon, the former vice president of enterprise evangelism at WSO2, wrote a white paper which explores how you can keep pace with these demands by implementing event driven architecture (EDA) in your enterprise.

EDA is essentially a messaging system that notifies interested parties of events that occur in order for them to benefit from it. The publish/subscribe model was implemented in the earliest real-time event-driven systems. Anonymity, discoverability and guaranteed delivery were a few of the characteristics that made it popular.

But this simple model deemed insufficient for the demanding and varied needs of subscribers, notes Mathon. Here came the rise of the enterprise service bus (ESB), which standardized enterprise integration patterns, the business process server (BPS) which allowed messages to trigger business processes that dealt with events and business activity monitor, now named data analytics server (DAS), to monitor the health of enterprises through statistics.

These tools became standard components in an EDA and are useful even today, which is why IoT is reusing pub/subs all over again.

Screen Shot 2016-04-26 at 3

The easiest, fastest and most efficient way of implementing EDA in your enterprise is to incorporate already existing event-driven technologies. You may think writing dedicated software would be more cost efficient and cater more to your specific needs, but in the long run the cost of maintenance would be over a dozen times more than the initial cost of development.

Existing tools are designed to increase performance and reliability of your system. It’s also easy for non-programmers to use because of features such as drag-and-drop components. They can handle large loads and are robust, secure and resilient to failure.

You can choose a specific tool for a specific problem. For example, long-running processes use BPS and short-running ones use message broker (MB). Also, when the tools are combined together it can provide additional power by working together to achieve one goal.

The problem with combining tools is that they can each be large monolithic entities that require significant communication bandwidth and can cause increased load on servers. WSO2 solves this problem because all the tools you require are built as light-weight components with the same base framework making it possible to combine them in the same Java runtime.

When implementing an EDA you need to keep in mind the message flow rates and the characteristic of the message flows. Make sure not to create extremely large messages or do a lot of computation during processing. You also need to consider whether you will be designing for microservices; your architecture design depends on this. API management is another key factor that you need to keep in mind. And lastly, you need to know which tool to use for which job.

WSO2 offers a full suite of open source components for EDA to implement highly scalable and reliable enterprise grade solutions. This includes a complete middleware stack, which includes the WSO2 integration, analytics, security and API management platforms.

For more details download John’s whitepaper here.

Only 2 more weeks for WSO2Con Europe 2016!

Snip20160523_15

With only 2 weeks to go, we’re ready to rock your minds, and maybe even your bodies, at WSO2Con Europe happening at Park Plaza, Riverbank, London from June 7 to 9 this year. Get ready for three full days of knowledge, networking and entertainment at one of the biggest middleware conferences in the world!

We recently added guest speaker Roland Major, an enterprise architect at Transport for London, to the agenda, where he will be talking about Reducing Disruption to the Road Network Through the Cloud.

There’s more! Here’s what you can look forward to:

  • Inspiring keynotes from industry leaders including Vice President and Principal Analyst of Forrester Research Inc. Nigel Fenwick’s talk on Digital Predator Or Prey: Which Will Your Company Be?
  • Insightful sessions on Internet of Things (IoT), microservices, API management, security, analytics and more including 12 guest speakers from Profesia, City Sprint, Yenlo, CSI Piemonte and Emoxa among others.
  • Hands-on product tutorials by WSO2 experts covering areas such as integration, security, IoT and mobility, analytics and devops.
  • Networking opportunities with industry thought leaders, peers and WSO2 experts at the welcome reception and conference party.
  • A strategy forum that will help CxOs uncover key strategies and gain insights into how their enterprise can remain competitive and grow revenue.
  • A solutions provider track where our sponsors, including Yenlo and RealDolman, will explore customer use cases on partner driven projects built around the WSO2 platform.

Visit https://eu16.wso2con.com/ for more information about the agenda, speakers and registration.

Connected Finance: Unleashing the True Potential of Finance with Technology

Evolution in technology has made customers more demanding, and at the same time, created new opportunities for financial institutions. The meteoric evolution of technology has prompted customers to look for quick and convenient ways to carry out banking needs, making mobile and online services popular. Financial companies need to make sure that they can deliver these services independent of location in a secure manner. It has also become compulsory to accommodate mobile payments and virtual payments in the connected finance ecosystem, resulting in a complex IT landscape.

Enterprises in the financial industry recognize the importance of delivering these needs to remain competitive; however, the challenge is to build a real-time system that centrally connects everything. Services and APIs are used to seamlessly connect the various backend components to build a robust connected ecosystem.

Asanka Abeysinghe, VP of Solutions Architecture at WSO2, recently authored a white paper – Connected Finance Reference Architecture – in which he discusses the significance of creating a connected finance system. He also explains how a middleware platform can be used to address each and every challenge faced at implementation.

Here are some highlights from this white paper.

The connected finance architecture will primarily facilitate regular, day-to-day functionalities, as well as call center-type functionalities, virtual payments, credit card payments and payment gateways. It will also make the vast amounts of data centrally accessible, allowing decision makers to gain business insights via customized reports and dashboards.

Screen Shot 2016-04-26 at 3

Given the sensitive nature of the industry, this aspect is important and needs to be addressed properly. For this, the architecture should connect all the systems and ensure all security measures have been incorporated. Each and every transaction should be closely monitored while ensuring all transactions flow through the same layer allowing the company to  monitor, manage, and govern financial transactions.

In addition, Asanka explores the role of event-driven architecture (EDA) in the connected finance ecosystem along with an architectural pattern for monitoring gateways. He discusses how WSO2’s complete cloud architecture enables enterprises to implement a hybrid deployment that complies with the tight regulations of the financial industry.

For any financial company, becoming a connected business will help to provide customers a better service as well as enable them to become more efficient and profitable overall.

For more details on the Connected Finance Reference Architecture, download and read the white paper here.

Solving the DEBS 2016 Grand Challenge using WSO2 CEP

The ACM DEBS Grand Challenge is a yearly competition where the participants implement an event-based solution to solve a real world high-volume streaming data problem.

This year’s grand challenge involves developing a solution to solve two (real world) problems by analyzing a social-network graph that evolves over times. The data for the DEBS 2016 Grand Challenge has been generated using Linked Data Benchmark Council (LDBC) social network data generator. The ranking of the solutions is carried out by measuring their performance using two performance metrics: (1) throughput and (2) average latency.

WSO2’s been submitting solutions to the grand challenge since 2013, and our previous grand challenge solutions have been ranked as one of the top solutions among the submissions. This year, too, we submitted a solution using WSO2 CEP/Siddhi. Based on its performance, this year’s solution has also been selected as one of the best solutions. As a result, we’ve been invited to submit a full paper to the DEBS 2016 conference to be held from 20 June to June 24.

In this blog I’ll present some details of DEBS queries, (a brief) overview our solution and some performance results.

Query 1

As pointed out earlier, DEBS 2016 involves developing an event-based solution to solve two real world use cases of an event processing application.

The first problem (query) deals with the identification of posts that currently trigger the most activity in a social network. This query accepts two input streams namely the posts and comments.

Think of a Facebook post with comments. Our goal is to compute the top three active posts where the score of a post is computed as the sum of its own score and the score of its related comments. The initial score of a post is 10 and it decreases by 1 every 24 hours. Similarly, the initial score of a comment is also 10 and decreases by 1 in the same manner.

Note that the score of a post/comment cannot reach below zero; a post whose total score is greater than zero is defined as an active post.

Query 2

The second deals with the identification of large communities that are currently involved in a topic.

This query accepts three input streams : 1) comments 2) likes and 3) friendships.

The aim is to find the k comments with the largest range, where the comments were created more than d seconds ago. Range here is defined as the size of the largest connected components in the graph defined by the persons who have liked that comment and know each other.

The friendship stream plays an important role in this query, as it establishes the friendships between the users in the system. The following figures shows the friendship graph when the system receives 10 and 100 friendship events respectively.

0-u-8_tlg5Vlvr4AsJ-

Figure 1: Friendship Graph (Number of Events = 10)

0-hVm124A9qzSR1FDA-

Figure 2: Friendship Graph (Number of Events = 100)

Further analysis of the friendship graph indicates that the degree of distribution of the friendship graph is long-tailed (see Figure 3). This means that there are very small number of users who have a large number of friends and a large number of users have a few friends.

0-Q_RFjNte1iLEjwgW-

Figure 3: Degree Distribution of Friendship Graph

Solution Overview

We implemented the solution using WSO2 CEP as an extension to Siddhi. The solution is a multi-threaded: it processes the two queries in parallel.

Each query is processed as a pipeline where the pipeline consists of three phases: 1) data loading, 2) event-ordering and 3) processing. Each phase in the query is processed using one or more threads. In the data loading phase the data streams are loaded from the files (i.e.disk) and placed in (separate) buffers. Each event stream has its own buffer which is implemented as a blocking queue.

The purpose of the event-ordering phase is to order the events based on their timestamps prior to sending them to the event processor (note: As far as events in an event buffer is concerned, they are already ordered based on their timestamps. The purpose of the ordering done in this phase is to ensure that the merged event-stream that is sent to event processor is ordered based on their timestamps). The core calculation modules of the queries are implemented in the processing thread.

Performance results

The solution was tested on a four core/8GB virtual machine running Ubuntu Server 15.10. As discussed earlier, the two performance metrics used for evaluating the system are the throughput and the mean latency. The performance evaluation has been carried out using two data sets of different sizes (see here and here).

The throughput and mean latency of query 1 for the small data set are 96,004 events/second and 6.11 ms respectively. For the large data set the throughput and mean latency of the query 1 are 71,127 events/sec and 13 ms.

The throughput and mean latency of query 2 for the small data set are 215,642 events/second and 0.38 ms respectively. For the large data set the throughput and mean latency of the query 2 are 327,549 events/sec and 0.73 ms.

A detailed description of the queries and specific optimization techniques that we have used in our queries can be found in a paper titled Continuous Analytics on Graph Data Streams using WSO2 Complex Event Processor, which will be presented shortly  in DEBS 2016: the 10th ACM International Conference Event-Based Systems, June 2016.

Event-Driven Architecture and the Internet of Things

It’s common knowledge now that the Internet of Things is projected to be a multi-trillion dollar market with billions of devices expected to be sold in a few years. It’s happening already. What’s driving IoT is a combination of low-cost hardware and lower power communications, thus enabling virtually everything to become connected cheaply. Even Facebook talked about it in their recent F8 conference (photo by Maurizio Pesce). 

16748634049_d7aea3646d_k

And why wouldn’t they? A vast array of devices that make our lives easier and smarter are flooding the market ranging from fuel-efficient thermostats, security systems, drones, and robots, among others. The industrial market for connected control and monitoring has existed and will expand in automated factories, logistics automation, and building automation. However, efficiencies are being found with new areas. For instance, connected tools for the construction site enable construction companies to better manage construction processes. We are also seeing increased intelligence from what can be referred to as the network effect – the excess value created by the combination of devices all being on a network.

What’s remarkable is that all IoT protocols share one common characteristic, i.e. they are all designed around publish/subscribe. The benefit of publish/subscribe event driven computing is simplicity and efficiency.

Devices or endpoints can be dynamic, and added or lost with little impact to the system. New devices can be discovered and rules applied to add them to the network and establish their functionality. All IoT standards support some form of discovery mechanism so that new devices can be added as near seamlessly as possible. Over the air a message can be delivered once to many listeners simultaneously without any extra effort by the publisher.

Addressing The Challenges

All of this efficiency and flexibility sounds too good to be true? You guessed right. The greatest challenge with this is security and privacy. While most protocols support encryption of messages, there are serious issues with security and privacy with today’s protocols. There are many IoT protocols and the diversity indicates a lot of devices will not be secure and it is likely that different protocols will have different vulnerabilities. Authentication of devices is not generally performed, so various attacks based on impersonation are possible.

Most devices and protocols don’t automate software updating and complicated action is needed sometimes to update software on devices. This can lead to vulnerabilities persisting for long periods. However, eventually, these issues will be worked out and devices will automatically download authenticated updates. The packets will be encrypted to prevent eavesdropping and it will be harder to hack IoT device security, albeit this could take years. Enterprise versions of devices will undoubtedly flourish, thereby supporting better security as this will be a requirement for enterprise adoption.

Publish/subscribe generates a lot of excitement due to the agility it gives people to leverage information easily, thus enabling faster innovation and more network effect. Point-to-point technologies lead to brittle architectures that are burdensome to add or change functionality.

WSO2 has staked out a significant amount of mindshare and software to support IoT technologies. WSO2 helps companies with its lean, open-source componentized event driven messaging and mediation technology that can go into devices and sensors for communication between devices and services on hubs, in the cloud or elsewhere; big data components for streaming, storing and analyzing data from devices; process automation and device management for IoT and application management software for IoT applications and devices. WSO2 can help large and small firms deploying or building IoT devices to bring products to market sooner and make their devices or applications smarter, easier, and cheaper to manage.

To learn more about event-driven architecture refer to our white paper – Event-Driven Architecture: The Path to Increased Agility and High Expandability

Want to know more about using analytics to architect solutions? Read  IoT Analytics: Using Big Data to Architect IoT Solutions

 

Understanding Causality and Big Data: Complexities, Challenges, and Tradeoffs

image credit: Wikipedia, Amitchell125

“Does smoking cause cancer?”

We have heard that lot of smokers have lung cancer. However, can we mathematically confirm that smoking causes cancer?

We can look at cancer patients and check how many of them are smoking. We can look at smokers and check will they develop cancer. Let’s assume that answers come up 100%. That is, hypothetically, we can see a 1–1 relationship between smokers and cancer.

Okay: can we claim that smoking causes cancer? Apparently it is not easy to make that claim. Let’s assume that there is a gene that causes cancer and also makes people like to smoke. If that is the cause, we will see the 1–1 relationship between cancer and smoking. In this scenario, cancer is caused by the gene. That means there may be an innocent explanation to 1–1 relationship we saw between cancer and smoking.

This example shows two interesting concepts: correlation and causality from statistics, which play a key role in Data Science and Big Data. Correlation means that we will see two readings behave together (e.g. smoking and cancer) while causality means one is the cause of the other. The key point is that if there is a causality, removing the first will change or remove the second. That is not the case with correlation.

Correlation does not mean Causation!

This difference is critical when deciding how to react to an observation. If there is causality between A and B, then A is responsible. We might decide to punish A in some way or we might decide to control A. However, correlation does warrant such actions.

For example, as described in the post The Blagojevich Upside, the state of Illinois found that having books at home is highly correlated with better test scores even if the kids have not read them. So they decide the distribute books. In retrospect, we can easily find a common cause. Having the book in a home could be an indicator of how studious parents are, which will help with better scores. Sending books home, however, is unlikely to change anything.

You see correlation without a causality when there is a common cause that drives both readings. This is a common theme of the discussion. You can find a detailed discussion on causality from the talk “Challenges in Causality” by Isabelle Guyon.

Can we prove Causality?

Casualty is measured through randomized experiments (a.k.a. randomized trials or AB tests). A randomized experiment selects samples and randomly break them into two groups called the control and variation. Then we apply the cause (e.g. send a book home) to variation group and measure the effects (e.g. test scores). Finally, we measure the casualty by comparing the effect in control and variation groups. This is how medications are tested.

To be precise, if error bars for groups does not overlap for both the groups, then there is a causality. Check https://www.optimizely.com/ab-testing/ for more details.

However, that is not always practical. For example, if you want to prove that smoking causes cancer, you need to first select a population, place them randomly into two groups, make half of the smoke, and make sure other half does not smoke. Then wait for like 50 years and compare.

Did you see the catch? it is not good enough to compare smokers and non-smokers as there may be a common cause like the gene that cause them to do so. Do prove causality, you need to randomly pick people and ask some of them to smoke. Well, that is not ethical. So this experiment can never be done. Actually, this argument has been used before (e.g.https://en.wikipedia.org/wiki/A_Frank_Statement. )

This can get funnier. If you want to prove that greenhouse gasses cause global warming, you need to find another copy of earth, apply greenhouse gasses to one, and wait few hundred years!!

To summarize, Casualty, sometime, might be very hard to prove and you really need to differentiate between correlation and causality.

Following are examples when causality is needed.

  • Before punishing someone
  • Diagnosing a patient
  • Measure effectiveness of a new drug
  • Evaluate the effect of a new policy (e.g. new Tax)
  • To change a behavior

Big Data and Causality

Most big data datasets are observational data collected from the real world. Hence, there is no control group. Therefore, most of the time all you can only show and it is very hard to prove causality.

There are two reactions to this problem.

First, “Big data guys do not understand what they are doing. It is stupid to try to draw conclusions without randomized experiment”.

I find this view lazy.

Obviously, there are lots of interesting knowledge in observational data. If we can find a way to use them, that will let us use these techniques in many more applications. We need to figure out a way to use it and stop complaining. If current statistics does not know how to do it, we need to find a way.

Second is “forget causality! correlation is enough”.

I find this view blind.

Playing ostrich does not make the problem go away. This kind of crude generalizations make people do stupid things and can limit the adoption of Big Data technologies.

We need to find the middle ground!

When do we need Causality?

The answer depends on what are we going to do with the data. For example, if we are going to just recommend a product based on the data, chances are that correlation is enough. However, if we are taking a life changing decision or make a major policy decision, we might need causality.

Let us investigate both types of cases.

Correlation is enough when stakes are low, or we can later verify our decision. Following are few examples.

  1. When stakes are low ( e.g. marketing, recommendations)?—?when showing an advertisement or recommending a product to buy, one has more freedom to make an error.
  2. As a starting point for an investigation?—?correlation is never enough to prove someone is guilty, however, it can show us useful places to start digging.
  3. Sometimes, it is hard to know what things are connected, but easy verify the quality given a choice. For example, if you are trying to match candidates to a job or decide good dating pairs, correlation might be enough. In both these cases, given a pair, there are good way to verify the fit.

There are other cases where causality is crucial. Following are few examples.

  1. Find a cause for disease
  2. Policy decisions ( would 15$ minimum wage be better? would free health care is better?)
  3. When stakes are too high ( Shutting down a company, passing a verdict in court, sending a book to each kid in the state)
  4. When we are acting on the decision ( firing an employee)

Even, in these cases, correlation might be useful to find good experiments that you want to run. You can find factors that are correlated, and design the experiments to test causality, which will reduce the number of experiments you need to do. In the book example, state could have run a experiment by selecting a population and sending the book to half of them and looking at the outcome.

Some cases, you can build your system to inherently run experiments that let you measure causality. Google is famous for A/B testing every small thing, down to the placement of a button and shade of color. When they roll out a new feature, they select a population and roll out the feature for only part of the population and compare the two.

So in any of the cases, correlation is pretty useful. However, the key is to make sure that the decision makers understand the difference when they act on the results.

Closing Remarks

Causality can be a pretty hard thing to prove. Since most big data is observational data, often we can only show the correlation, but not causality. If we mixed up the two, we can end up doing stupid things.

Most important thing is having a clear understanding at the point when we act on the decisions. Sometime, when stakes are low, correlation might be enough. On some other cases, it is best to run a experiment to verify our claims. Finally, some systems might warrant building experiments into system itself, letting you draw strong causality results. Choose wisely!

Connected Health – Reinventing Healthcare with Technology

Demands for more personalized and convenient services from healthcare providers has steadily increased during the past decade. Increase in populations, life expectancy, and the advancement of technology are a few key contributors to this uptick in demand. These demands have resulted in creating a global eHealth market that is supposed to reach $308 billion by 2022, as predicted by Grand View Research INC.

The essence of a connected healthcare business is to deliver an efficient, effective service to its users by connecting disparate systems, devices, and stakeholders. It aims to automate most tasks and eliminate human error, trigger intelligent events for the hospitals and other stakeholders, and provide medical information via a range of devices at various locations. By becoming a connected ecosystem, hospitals have the opportunity to reduce costs, increase revenue, as well as offer a high-quality service to patients.

The success of a connected healthcare business though depends on how the enterprise will look to address key challenges via comprehensive solutions.

In the white paper “Connected Health Reference Architecture” Nuwan Bandara, a solutions architect at WSO2, discusses the significance of creating a connected healthcare system and explains how a middleware platform can be used to address each and every challenge faced at implementation.

Screen Shot 2016-04-26 at 3

One of the key challenges he highlights is the ability to deliver aggregated information without  any latency issues between sources. In order to overcome this, you would need a centralized system that enables smooth integration of devices, services, and workflows. The use of multiple devices that take various measurements in different formats makes it a bit more difficult compared to other connected ecosystems; however, this can be addressed by consolidating the gathered data, and making it easily accessible to various services and applications from different locations.

Given that all this data is private information, it is vital to have fool proof security measures in place as well to restrict access only to authorized personnel, Nuwan notes.

Furthermore, it is important for hospitals to be geared to manage high capacities during crisis situations. If the system is unable to cope with high loads during these times, the system will crash and disrupt all workflows. Hospitals overcome this by equipping their systems with elastic scaling to handle high loads.

To learn more about the Connected Health reference architecture, download and read the white paper here .

Enabling Microservice Architecture with Middleware

Microservices is rapidly gaining popularity among today’s enterprise architects to ensure continuous, agile delivery and flexible deployments. However many mistake microservice architecture (MSA) to be a completely new architectural pattern. What most don’t understand is that it’s an evolution of Service Oriented Architecture (SOA). It has an iterative architectural approach and development methodology for complex, service-oriented applications.

microservices

Asanka Abeysinghe, the vice president of solutions architecture at WSO2, recently wrote a white paper, which explores how you can efficiently implement MSA in a service-oriented system.

Here are some insights from the white paper.

When implementing MSA you need to create sets of services for each business unit in order to build applications that benefit their specific users. When doing so you need to consider the scope of the service rather than the actual size. You need to solve rapidly changing business requirements by decentralizing governance and your infrastructure should be automated in such a way that allows you to quickly spin up new instances based on runtime features. These are just a few of the many features of MSA, some of which are shared by SOA.

MSA combines the best practices of SOA and links them with modern application delivery and tooling (Docker and Kubernetes) and technology to carry out automation (Puppet and Chef).

In MSA you need to give importance to how you scope out a service rather than the size. The inner architecture of an MSA addresses the implementation architecture of the microservices, themselves. But to enable flexible and scalable development and deployment of microservices, you first need to focus on its outer architecture, which addresses its platform capabilities.

Enterprise middleware plays a key role in both the inner and outer architecture of MSA. Your middleware needs to have high performance functionality and support various service standards. It has to be lean and use minimum resources in your infrastructure as well as be DevOps-friendly. It should allow your system to be highly scalable and available by having an iterative architecture and being pluggable. It should also include a comprehensive data analytics solutions to ensure design for failure.

This may seem like a multitude of functionality and requirements that are just impossible to meet. But with WSO2’s complete middleware stack, which includes the WSO2 Microservices Framework for Java and WSO2 integration, API management, security and analytics platforms, you can easily build an efficient MSA for your enterprise.

MSA is no doubt the way forward. But you need to incorporate its useful features into your existing architecture without losing applications and key SOA principles that are already there. By using the correct middleware capabilities, enterprises can fully leverage the advantages provided by an MSA to enable ease of implementation and speed of time to market.

For more details download Asanka’s whitepaper here.