researchblog
2019/07/05
July 05, 2019

Secure Elastic Data Stream Processing on Clouds

Introduction

Hybrid clouds provide a cost effective means of maintaining quality of service attributes of online services. Despite their multiple merits, hybrid clouds pose multiple security vulnerabilities. Some key questions regarding information security include how to secure data stored in multiple cloud service providers and how to secure applications which burst to the public cloud. General network security techniques such as Virtual Private Networks (VPN) could be used for securing the end-to-end communication of a hybrid cloud. However, if the cloud computing platform has been compromised, then the data will be at risk even if we have encrypted the end-to-end communication. In many cases insiders (e.g. employees) of an organization are often the root cause of successful cyber attacks. Increasing threats from malicious insider attacks are costly for enterprises. For example, a recent study by Accenture found that malicious insider attacks have grown by 15% in the year 2018 costing $1.6 million on average per each organization out of the 355 participants analyzed in the study. In such a context, Fully Homomorphic Encryption (or simply Homomorphic Encryption) is one of the silver bullets for the rescue.

What is Homomorphic Encryption?

Homomorphic encryption is a form of encryption which allows you to perform computation on encrypted data (i.e. ciphertexts). Homomorphic encryption generates an encrypted result that, when decrypted, matches with the results of the operations performed on the original data that has been performed on the plaintext. This allows the hiding of the data from the processor since it is not required to decrypt the ciphertexts at the public cloud. Multiple libraries does homomorphic encryption/decryption and examples include HELib, CUHE, SEAL, TFHE, among others.

What are the Benefits of Elastic Stream Processing?

Stream processors are software platforms which allow users to respond to incoming data streams faster. We apply homomorphic encryption on top of Elastic Stream Processing, which is a data analytics technique that provides load balancing of data stream processors. When there is excessive load on the stream processor located in a private cluster, more compute resources can be provisioned from the public cloud to maintain the agreed upon service quality attributes. Elastic Switching Mechanism (ESM) is an example of such a load balancer.

How Can You Implement Secure Elastic Stream Processing?

The architecture diagram shown below illustrates how a homomorphic encryption based elastic stream processor (HomoESM) can be implemented by extending the ESM. In this instance we used HElib as the fully homomorphic encryption library. Part of the input stream is sent to the public cloud by encrypting it with homomorphic encryption using HElib API. The portion of the data stream processed within the private cloud is sent to the Complex Event Processing (CEP) engine which is processed directly. The encrypted stream is processed by the Homomorphic CEP engine that is running on public cloud. The processed results from the public cloud are decrypted and merged with the event stream output from the CEP engine.

Figure 1: System architecture for Homomorphic Encryption based on the Elastic Stream Processor

We use an event stream compression technique called “Composite Event” creation which aggregates the fields of the events of the event stream into batches. This technique allows you to send more events to the public cloud and have them processed faster.

How Does the Homomorphic Encryption Perform?

We have conducted experiments to evaluate the benefits of homomorphic encryption using multiple stream processing applications called Email Filter and EDGAR log processor. This blog only describes the Email Filter. We have observed similar performance behaviors with other applications as well (see our paper for other applications).

Emails filtered by the Email FIlter are sent by three specific email addresses and based on the Enron Email dataset. The architecture of the Email Filter application is shown in Figure 2. The Data Injector injects the input stream to the application. The Publish Decision Taker module then decides whether the event should be sent to the private cloud (i.e. an organization’s local stream processor cluster) or to the public cloud (e.g. Amazon AWS). The Encryptor, HE equals operations, and Decryptor components are the modules which are related to homomorphic encryption. HE Logic Filter generates the Filtered Emails Stream which is merged at the Metrics Operator with the Filtered Emails Stream sent from the Filter operator.

Figure 2: Email Filter Application

We evaluated the performance of the Email Filter application deployed on HomoESM on three VMs of Amazon EC2. Two of the VMs were acting as the private cloud and the third one as the public cloud. The input workload was generated by scanning through the Enron Email data set and injecting each and every email as an event into the Email Filter Application using the data injector. The emails were sent to the application at a fixed rate, however different emails had different payload sizes which resulted in significant variation in the input workload.

The results of privacy preserving elastic scaling are shown in Figure 3. The vertical lines “VM Start” and “VM Stop” indicate the points where the virtual machine (VM) in the public cloud have started and stopped respectively. The dotted line in blue illustrates the performance of the private cloud only deployment while the line in red indicates the performance behavior of elastic scaling with homomorphic encryption. Using HomoESM resulted in a 10% improvement of average latency for the Email Filter application. Note that HomoESM conducts elastic switching when the predefined service level agreements (SLA) cannot be met with the resources left in the private cloud. When the server resource utilization is above a certain threshold, which is expressed as a SLA parameter, the switching function triggers the hybrid cloud execution.

Figure 3: Average latency of elastic scaling of the Email Filter benchmark

Conclusion

This blog presents a technique for secure outsourcing of processing data streams. We apply homomorphic encryption for streaming data which is then outsourced into external cloud service providers. Traditionally, homomorphic encryption is known to be a computationally expensive operation. However, data batching and asynchronous event publishing could be used for obtaining performance improvements for elastic data stream processing with homomorphic encryption. We observe significant improvements in the overall system performance with our homomorphic elastic scaling mechanism. We are currently investigating the potential of applying the same technique for other data processing scenarios in hybrid cloud settings.