2014/11/21
21 Nov, 2014

[Article] Implementing Time Series Functions on WSO2 Complex Event Processor

  • Seshika Fernando
  • Vice President - Banking and Financial Services - WSO2
Archived Content
This content is provided for historical perspective only, and may not reflect current conditions. Please refer to the WSO2 analytics page for more up-to-date product information and resources.

Applies to

WSO2 Complex Event Processor version 3.1.0

Table of contents

  • Introduction
  • Use case
  • Time series functionality in WSO2 CEP
  • How to implement time series regression in WSO2 CEP
  • How to try out the sample
  • Conclusion
  • Attachments

Introduction

WSO2 Complex Event Processor (WSO2 CEP) supports the application of Time Series Functions in order to identify patterns between data series, forecast future events, and identify outliers. This article will explain in detail how to implement each function of the time series toolbox and describe how to optimize the performance of time series queries.

Use case

We will be using a Financial Market Use case as Time Series Regression is often used for Fraud Detection in financial transactions. We use sample credit card transaction data to identify the consumer usage patterns. In this simple example, we want to get the time series regression equation between the time of day and transaction amount (i.e. transaction amount will be the dependent variable and time of day will be the independent variable). The data for this use case is stored in the transactions.csv file. (refer to timeseries.zip archive)

Time series functionality in WSO2 CEP

Time series support is powered by Regression. Time series toolbox introduces three new extensions to CEP: regression, predict, and outlier detection. Let us look at each extension in detail.

  1. Performing Regression

    CEP will perform Simple Linear or Multiple Linear Regression on the data set provided by the user and output the regression equation along with the standard error and all the input values.

    The format of the Siddhi query that needs to be used to perform regression is as follows:

    from InputDataStream#transform.timeseries:regress(dependent variable, independent variable(s))

    select *

    insert into OutputDataStream

    Example – Simple Linear Regression

    If we have an input stream of credit card transaction data and we want to perform a regression between the transaction amount (dependent variable) and the transaction time (independent variable), we can write the siddhi query as follows:

    from TxnDataStream#transform.timeseries:regress( amount, time )

    select *

    insert into ResultStream

    This query will output the standard error, all statistically significant beta values and all the values of the parameters available in the input stream, respectively.

    For more information please refer to https://docs.wso2.com/display/CEP400/Regression.

    Example – Multiple Linear Regression

    If we want to perform multiple linear regression using the same transaction data between the transaction amount (dependent variable) and the transaction time (independent variable) and merchant ID (independent variable), we can write the Siddhi query as follows:

    from TxnDataStream#transform.timeseries:regress( amount, time, merchantID )

    select *

    insert into ResultStream

  2. Forecasting future values

    CEP time series toolbox can be used to forecast data based on simple linear regression results. This can be achieved by using the forecast functionality as follows. The user needs to provide sufficient input data for CEP to create a regression equation and a value for the independent variable for which the user requires CEP to forecast the dependent variable value. For more information refer to https://docs.wso2.com/display/CEP400/Forecast

    Example

    If we want to forecast the next transaction amount based on the time of day we need to provide historical transaction data, and the time of day for which we hope to forecast the transaction amount.

    from TxnDataStream#transform.timeseries:regress(time+5, amount, time)

    select *

    insert into ResultStream

    At every event, the above query will perform regression on the accumulated data and provide a forecast transaction amount for 5 (seconds/minutes) after the transaction time of the event.

  3. Detecting outliers

    The time series toolbox also provides a function that can be used to detect outliers in the incoming events, using a regression equation performed on all previous events received. The outlier detection will depend on a range, which can also be provided by the user. For more information refer to https://docs.wso2.com/display/CEP400/Outlier

    Example

    If the user wants to detect any events that lie 2 standard deviations away from the forecast regression equation, the following query can be used:

    from TxnDataStream#transform.timeseries:outlier(2, amount, time)

    select *

    insert into ResultStream

    Whenever the CEP receives an event that lies outside 2 standard deviations of the regression equation, it will return ‘true’ and in all other times, it will return ‘false’ along with the rest of the outputs (standard error, all statistically significant betas and input parameters).

    Optional parameters

    Since large datasets may be used when performing regression, the CEP provides a useful feature that allows users to configure how the regression calculation should be performed using 2 parameters (calculation interval and batch size) in order to optimize for calculation.

    Calculation interval

    This determines the frequency of regression calculation based on incoming events. For example, if we have set calculation interval to be 10, CEP will perform the regression calculation after every 10 events received on all accumulated data. By default this value will be 1 (i.e. regression calculation happens at every event received).

    Window size

    Window size determines the upper limit of the number of events the CEP will consider for the regression calculation. So, for example, if the window size is 100,000, the CEP will start dropping the first event when the 100,001st event is received. By default this value will be 1,000,000,000.

    Confidence interval

    The user can provide the confidence interval to be used to calculate the statistical significance of the beta values of the regression equation. This will be set to 95% by default.

    Example – Simple Linear Regression using optional parameters

    The following query will perform simple linear regression between the amount and time data, at every 100 events up to 10,000 events. When the CEP receives 10,000 events, it will start dropping the first event(s) that came in before adding in new events to memory. It will also use a 99% confidence interval instead of the 95% default value.

    from TxnDataStream#transform.timeseries:regress( 100, 10000, 0.99, amount, time )

    select *

    insert into ResultStream

  4. Trying out time series regression in WSO2 CEP

    In order to perform time series regression we need to send a stream of transaction data to the CEP and configure the CEP to compute and output the regression equation based on the input data. The following steps are required to achieve this:

    1. Installing WSO2 CEP

      Download WSO2CEP from https://wso2.com/products/complex-event-processor/ and extract the wso2cep-3.1.0.zip archive into a directory. We will call the extracted directory CEP_HOME

    2. System preparation

      Download timeseries.zip and copy all jars in the ‘jars’ folder to CEP_HOME/repository/components/lib and replace the siddhi.extension file at CEP_HOME/repository/conf/siddhi with the siddhi.extension provided in the timeseries.zip.

      The sample data for this example can be found in transactions.csv, which should be copied to CEP_HOME/samples/resource folder.

    3. Create input event adaptor

      Since we are using a .csv file to feed the data in to CEP, we can create a file input event adaptor.

      Refer https://docs.wso2.com/display/CEP310/Input+File+Event+Adaptor for more information regarding creating a file input event adaptor.

    4. Define input stream & event builder

      We define an event stream that will capture the transaction amount and the time of day as 2 variables and send to the execution plan, which will perform the time series regression. In doing so, all we need to do is to provide a stream name, a version number and define the variables (names and types) that we will be using for regression.

      You can refer https://docs.wso2.com/display/CEP310/Working+with+Event+Streams for more information on creating stream definitions.

    5. Configure an Event Builder

      When we click ‘Add Event Stream’, the event stream will be successfully created and the CEP will prompt us to create an Event Builder that will build the events received by the Input Adaptor (created in Step 3) to the format that is required by the event stream.

      Select Custom Event Builder since we need to capture the input data from a .csv file. Alternatively we could list the Event Streams and click on the ‘In-flows’ button of the ‘TransactionData’ event stream and add a Custom Event Builder by clicking on the “Receive from External Event Stream (via Event Builder)” button.

      Once the system prompts the ‘Create Event Builder’ form, provide a name for the Event Builder and pick the input adaptor that we created in the first step. Then, we need to provide the absolute path where we have stored our input data file. Finally, we need to specify the Input Mapping Type. The only input mapping type allowed for file input adaptor is ‘text’, therefore ‘text’ will be picked by default.

      We then move on to configuring the text mapping, in order to map the input data to the variables defined in the input stream which we created in step 4. To do this, click on ‘Advanced’ under ‘Mapping Configuration’ division in the UI.

      Since we are using a comma delimited file, we will use a regex expression with regex groups to extract the comma delimited data.

      We have used the regex expression (\d+\.\d+),(\d+) in order to capture the ‘Amount’ data which is of type ‘double’ and to capture the ‘Time’ data which is of type ‘long’ using regex groups. This will build the Amount and Time data received from the file input adaptor to the format that is expected by the input stream we created in step 4.

      Further information about creating Event Builders can be found in https://docs.wso2.com/display/CEP310/Working+with+Event+Builders

    6. Create execution plan

      Now that we’ve properly configured the input of data to CEP, we move on to creating the execution plan which will perform the regression calculation. For this we add a new execution plan, give it a name and description and then import the input stream that we created in step 4. Please follow the steps given in https://docs.wso2.com/display/CEP310/Working+with+Execution+Plans to add the execution plan. The Siddhi query that should be updated to perform the regression is as follows.

      from input#transform.timeseries:regress(1, 10000, 0.95, Amount, Time)

      select *

      insert into RegressionResult

      After writing the query we can validate the siddhi query for syntax, using the ‘Validate Query Expressions’.

    7. Create output stream

      To export an output stream, we then enter the output stream name as ‘RegressionResult’ in the ‘Value Of’ field and click ‘Create Stream Definition’ in the ‘Stream ID’ section. This will provide us a form that we can use to create the output stream which will carry the results of the regression calculation. All the required information and payload data fields are populated by default and you can change any fields as necessary. For this example, since no changes are required, we are just going to click ‘Add Event Stream’.

    8. Create output event adaptor

      After creating the output stream, CEP will prompt us to create an event formatter and Output Adaptor. The default formatter and the Logger Adaptor will be selected by default. If some changes need to be done on the output data, a different formatter can be configured, and if the output data is supposed to go out to a downstream system, an appropriate output adaptor can be used. For the purposes of this scenario, we will use the default formatter and the logger adaptor which will log the output stream in the console.

    9. Check event flow

      By now we have configured the input, execution and output which are required to perform the time series regression. In order to check whether everything has been configured properly, we can go to the ‘Monitor’ tab and click on ‘Event Flow’. If CEP has been successfully configured to perform the regression, we should see the event flow like the one below, which includes the Input Event Adaptor (fileInputAdaptor), the Event Builder (regressionEventBuilder), the input stream (TransactionData 1.0.0), the execution plan (Regression), the output stream (RegressionResult 1.0.0), the output event formatter (RegressionResult_1.0.0_Formatter) and finally the output event adaptor (DefaultLoggerOutputAdaptor).

      After completing the above steps CEP will listen to the file that is stored in the path that we provided in step 5 and perform regression calculations according to the parameters that we have provided in the execution plan. You will be able to see the regression results on the console at every regression calculation performed, according to your configuration.

How to try out the sample

If you’d like to just see Time Series Regression in action on WSO2 CEP, without having to perform the configurations in the previous sections then you can download the attached timeseries.zip archive and follow the below steps.

  1. Download WSO2CEP from https://wso2.com/products/complex-event-processor/
  2. Extract the wso2cep-3.1.0.zip archive into a directory. We will call the extracted directory CEP_HOME
  3. Copy all jars in the ‘jars’ folder of the downloaded timeseries.zip archive to CEP_HOME/repository/components/lib folder.
  4. Replace the siddhi.extension file at CEP_HOME/repository/conf/siddhi with the siddhi.extension provided in the timeseries.zip.
  5. The sample data for this example can be found in transactions.csv, which should be copied to CEP_HOME/samples/resource folder.
  6. Replace the CEP_HOME/repository/deployment/server folder with the server folder in timeseries.zip
  7. Replace the CEP_HOME/repository/conf/data-bridge/stream-definitions.xml with the stream-definitions.xml included in timeseries.zip
  8. Start WSO2 CEP by running wso2server.bat (on Windows) or wso2server.sh (on Linux) from CEP_HOME/bin
  9. Copy transactions.csv file to CEP_HOME/samples/resource/
  10. You will be able to see the regression results on your console

Conclusion

WSO2 Complex Event Processor can be used for detecting and quantifying relationships among data streams, forecasting future events, and detecting outliers based on historical values, using Time Series Regression. These functionalities are useful in many event processing use cases, such as fraud detection, sales patterns, sports data analysis among many others.

Attachments

Click here to download timeseries.zip archive.
 

About Author

  • Seshika Fernando
  • Vice President - Banking and Financial Services
  • WSO2