[Article] Real-time Intruder Detection with R, PMML, and WSO2 CEP
By Upul Bandara
- 27 Nov, 2014
Building and testing machine learning models in R language
In this section, we will describe the machine learning model building process in the R language perspective. Additionally, we will provide a few general guidelines you can follow, irrespective of your modeling language. Figure 1 shows the simplified machine learning workflow you could follow in any machine learning project. Data capturing and cleaning are the first two activities of the model building process. Collecting, refining, and creating datasets take a lot of time and effort, hence, we have selected an existing dataset called “KDD Cup 1999 Data”. More details about this dataset is given in . The original dataset consists of 43 features, including the response. However, for this project, we selected 14 out of 43 features, including the response variable. It is to be noted that algorithms we demonstrated in this tutorial and WSO2 CEP is capable of handling large datasets with thousands of features. The main reason for selecting only 14 features is to simplify the model building process and the demonstration. The dataset with 14 features is given in the  github location.
Figure 1: Machine learning workflow
Now, we will discuss the model building process using R language. “Classification Tree” is the algorithm we use in this tutorial. For more information about “Classification Tree,” refer to  and . Next, follow the steps given below in order to build our “Classification Tree” model.
Run your “R” environment by typing “R” command in your command line as as given below and you will be able to see a screen similar to Figure 2.
Figure 2: Sample R console
Next, we need to read the dataset into R environment, and for this purpose we will following R command.
intruder <- read.csv('/home//upul/marketing_week/1/dataset/kddcup_intruder.csv');
Enter above command into the R console and hit the enter key as shown in Figure 3. Note that reading data into R might take some time and once it is over the blinking cursor will appear again.
Figure 3: Reading a dataset using R
According to our model building flowchart, next we need to split data into two portions called training and testing sets. In R, it can be easily done with a library called caTool. Since it is not available in the basic R installment, first we have to install that library. Type the following two commands to install and use caTools.
Now we are ready to split the dataset two training and testing subsets. Enter following three R commands one by one to create your training and testing subsets.
split <- sample.split(intruder$response, SplitRatio=0.7); train <- subset(intruder, split == TRUE); test <- subset(intruder, split == FALSE);
Since the “Classification Tree” algorithm we are going to use for this tutorial is not available in the default R installation, install it with the following command. Moreover, don’t forget to load rpart package into the R environment using the libarry() function.
Now, we have just completed the background work and we are ready to build our model. Enter the following command to build our classification tree model.
intruderTree <- rpart(response ~ root_shell+su_attempted+num_root+num_file_creations+ num_shells+num_access_files+ num_outbound_cmds+is_host_login+ is_guest_login+count+srv_count+ serror_rate+srv_serror_rate, data=train, method="class", control=rpart.control(minbucket=50));
Now we have created our tree model and its time to assess its performance using testing dataset. Just enter the following commands to test the predictive performance of our “Classification Tree” model.
intruderTreePred <- predict(intruderTree, newdata=test, type="class"); t <- table(test$response, intruderTreePred) sprintf("%s: %f", "testing set accuracy: ", predAccuracy);
If you run about R command, you will be able to see that performance of our model is acceptable (actually very close to 100), hence, we have high confidence that this model works well on new data. Therefore, now we can import our model into the PMML model and install it in WSO2 CEP.
Converting R models into PMML format
In this section, we show how to convert your “Classification Tree” model built in the PMML model.
By default, R doesn’t come with a PMML library; therefore, you first have to install (and load) it using the following commands.
Next, use the following two commands to convert your tree model into PMML format.
treePMML <- pmml(intruderTree); write(toString(treePMML),file = “
with a location where, you are going to save your newly created PMML model (If you like to run the above commands as a R script, it is available at https://github.com/upul/intruder_detection).
Now we have created a model in R and exported it into PMML. In the next section, we will configure WSO2 CEP to use our newly created PMML model.
In this section, we show how to configure WSO2 CEP to use the PMML that was created in previous sections. Configuring consists of creating an input stream, output stream, and a new Siddhi Query. We will discuss those three items in the following sections. It should be noted that the following sections assume that you already know how to configure WSO2 CEP and the process of installing PMML extension in the WSO2 CEP server. For more details about those two topics refer to  and .
Creating an input event stream
Input event streams are used to capture incoming events. For creating an input event stream for our intruder detection model, go to the “Add Event Stream” window (Main > Event Streams > Add Event Stream). Next, complete input boxes as highlighted in Figure 4. Please enter “IntruderInputStream” as the name of the input event stream and “1.0.0” as its version number. Finally, enter all features names (except response variable) as payload attributes. Note that those names are available in the intruderTree.pmml model file created above. Once all necessary information is entered, your input event stream creation window will look similar to what’s depicted in Figure 5.
Figure 4: Create a new event stream
Figure 5: Sample input for creating a new input event stream
Creating an output event stream
Creating an output event stream is identical to creating an input event stream. First, go to “Define New Event Stream” window (Home > Manage > Event Processor > Event Stream) and enter output event stream details as given in Figure 6 to create our output event stream.
Figure 6: Creating new output event stream
Once you have created input and output event streams those will be appea as shown in Figure 7.
Figure 7: Newly created input and output event streams
Now go to “In-Flows” of the “IntruderInputStream” and “Out-Flows” of the “IntruderOutputStream” and modified them as shown in Figure 8 and Figure 9.
Figure 8: Sample input event builder
Figure 9: Sample output event builder
Writing a Siddhi Query
Now we have come to the final section of the process of setting up WSO2 CEP for running our intruder classification model. In this section, we write a simple Siddhi Query for our machine learning mode. The purpose of a Siddhi Query is to identify, process, and transform complex event occurrences. To create a new Siddhi Query, first go the “Add Execution Plan” (Main > Execution Plans > Add Execution Plan) window. Figure 10 shows the necessary fields you would need to enter to create a new execution plan.
Figure 10: Sample execution plan
Provide sensible names for “Execution Plan Name” and “Description”. Then, select “IntruderInputStream:1.0.0” for the import stream and provide intruderIS for for “As”. Similarly, select “IntruderOutputStream:1.0.0” as the export Stream’s StreamId and enter intruderOS for its “Value Of” field. Finally, enter the following Siddhi Query into query editor.
from intruderIS#transform.mlearn:getModelPrediction ("
/intruderTree.pmml", root_shell, su_attempted, num_root, num_file_creations, num_shells, num_access_files, num_outbound_cmds, is_host_login, is_guest_login, count, srv_count, serror_rate, srv_serror_rate) select response insert into intruderOS
Note that you have to change
If everything goes well, you will be able to see your model’s event flow (Home > Monitor > Event Flow) as shown in Figure 11.
Figure 11: Event flow diagram
In order to test your intruder detection system, open event simulator (Home > Tools > Event Simulator). Select a testing instance from your testing dataset and enter values of those features into the event simulator as shown in Figure 12. Next, press the send button and the predicted output will appear in the CEP server’s console window as shown in Figure 13. Similarly, you can test you intruder detection system with a few more testing instances and cross-check predicted values with the actual values given in the training dataset.
Figure 12: Event simulator
Figure 13: Sample predictions shown in CEP console
In this tutorial we described how you could enhance the predictive capabilities of WSO2 CEP servers using machine learning. We showed how it is easy to develop and test machine learning models in R language. Next, we converted those models into MPPL format and installed in WSO2 CEP. Finally, we tested our intruder detection model using the event simulator that comes with the CEP.
Using a similar approach, you now should be able to create, test, and run machine learning models and hence enhance the predictive capabilities of the WSO2 CEP server.
- KDD Cup 1999 Dataset, https://archive.ics.uci.edu/ml/machine-learning-databases/kddcup99-mld/kddcup99.html [Access Date: 10-15-2014]
- Sample R code and the dataset, https://github.com/upul/intruder_detection [Access Date: 10-16-2014]
- WSO2 CEP User Guide https://docs.wso2.com/display/CEP310/User+Guide [Access Date: 10-16-2014]
- Implementing a WSO2 CEP Extension to Run Machine Learning Models Written in PMML Format https://wso2.com/library/tutorials/2014/08/tutorial-implementing-a-wso2-cep-extension-to-run-machine-learning-models-written-in-pmml-format/ [Access Date: 10-16-2014]
- http://en.wikipedia.org/wiki/Decision_tree_learning [Access Date: 10-16-2014]
- http://statweb.stanford.edu/~tibs/ElemStatLearn/ [Access Data: 10-17-2014]