How We Implemented Zero Trust in Choreo

In the past, enterprise software focused on protecting network access through on-premises firewalls and VPNs, working on the assumption that everything within the network was secure. However, today, as accessing data has extended beyond on-premises locations to cloud and hybrid networks, SaaS platforms require a security model that can address a broader range of attack vectors. Zero Trust security addresses this pressing need. It operates on the following foundational principles:

Never Trust, Always Verify
Verify Explicitly
Least Privilege Access
Micro-Segmentation
Layered Security Controls and MFA
Assume Breach
End-to-End Encryption
Full User and System Visibility

However, Zero Trust isn't just an add-on feature; it requires a comprehensive architectural approach for each layer of the platform. Every component, connection, and data flow should align with one or more of the above eight principles.

In this article, we’ll delve deeper into this security model and explore the journey towards enabling and nurturing Zero Trust security within Choreo—our internal developer platform as a service. We'll analyze the necessary architectural adaptations and understand how Zero Trust is woven into Choreo to ensure a fortified and resilient security shield.

Choreo: A Bird's Eye View

In Choreo, the architecture is divided into two fundamental components: the control plane and the data plane. The control plane assumes the essential task of administering organizations, users, and projects. In addition, it governs the complete journey of application development, starting from the initial stages of creation, progressing through deployment, incorporating governance measures, and culminating in the provision of observability. The Choreo control plane caters to a variety of user personas, including CIOs, architects, developers, and DevOps/SREs/PEs, ensuring robust security across all user logins with multi-factor authentication (MFA) powered by Asgardeo and audit logs to avoid any non-repudiation situations. [1]

The data plane is where user applications are deployed, guided by the settings established in the control plane. User applications can include services, web apps, and tasks that operate on defined schedules. These applications can be written in any programming language, supporting a polyglot approach. It is important to note that all traffic pertaining to a user application runtime is confined within the boundaries of the Choreo data plane. This ensures that user data remains strictly within the data plane and never ventures beyond its limits. To understand how Choreo protects data, we have in-depth documentation [2] that dives into the nuances of our data handling process.

Choreo's architecture features two distinct types of data planes: cloud data planes and private data planes. A cloud data plane utilizes a multi-tenanted infrastructure model for deploying user applications, enabling a communal yet secure environment for application runtime. In contrast, a private data plane provides dedicated infrastructure allocated to a single organization for running their user applications. This ensures an added layer of privacy and control for organizations with specific requirements.

Figure 1: A bird's eye view of Choreo

Regardless of the data plane type, Choreo enforces a Zero Trust security model across both. This approach ensures that every application, whether it runs on the shared cloud data plane or the dedicated private data plane, benefits from the Zero Trust model. This consistency in security policy is part of our commitment to creating a safe, reliable, and robust platform for all our users.

Cell-Based Architecture for Micro-Segmentation

Choreo operates on a cell-based architecture—an approach that groups components from design, implementation, and deployment into a unified entity known as a 'cell'. These cells serve as the platform’s fundamental building blocks, each being independently deployable, manageable, and observable. A cell directly corresponds to a Choreo project, and embodies the attributes discussed in the cell-based architecture.

Within each cell, components communicate with each other using supported transports for intra-cell communication. This localized communication optimizes performance and minimizes latency, keeping the interactions secure within the cell's confines.

However, any external communication must pass through an API gateway. This critical architectural feature serves as a regulated entry point, providing APIs, events, or streams via governed network endpoints using standard network protocols. This gateway ensures that all incoming and outgoing traffic is strictly monitored and managed, thereby aligning well with 'never trust, always verify'.

Figure 2: Cell communication in telecom terms

Choreo Cell-based Architecture with Cilium and eBPF

Within the Choreo data plane, two API gateways perform vital roles, managing both internal and external traffic directed towards user applications. The first gateway facilitates external API traffic originating from the public internet to reach the user applications running within a project. The second gateway routes internal traffic emanating from other project components that, while originating from within the same organization, are external to a specific Choreo project.

Each Choreo project, composed of diverse components such as microservices, integrations, web apps, and tasks, is bundled and deployed into a distinct Kubernetes namespace. This dedicated namespace, unique for each project within a specific environment, forms a vital element of Choreo's cell-based architecture.

Cilium [4], an open source solution for the cloud native landscape, leverages eBPF, which is an advanced kernel technology, to offer secure and observable network connectivity among workloads. We leverage Cilium policies, which operate across Layer 3, 4, and 7 of the OSI networking model, to manage ingress and egress traffic within the Choreo data plane. This multilayer policy application provides greater flexibility and control over network traffic within the data plane.

By employing Cilium network policies, we ensure the security of these project namespaces. This restricts any workloads operating outside a project's namespace from accessing its internal components, in line with the established pod labeling and identity-based network security policy enforcement at the network layer 3 and 4. The only traffic allowed into the project is that which is routed through the internal and external gateways. Both gateways maintain stringent control, allowing access to the component resources solely through APIs specifically curated and published via the Choreo’s API management capabilities. These managed APIs implement fine-grained access control by adeptly using scope binding along with role-based access control (RBAC). This refined authorization mechanism ensures that specific permissions are appropriately assigned and validated, enhancing the overall security architecture. With this architecture and policy enforcement, any access to the resources within a component from outside of a project only transpires post successful authentication and authorization.

Choreo ensures that all internal network traffic is encrypted, a function we achieve by using Cilium Transparent Encryption. This approach utilizes in-kernel IPsec to provide efficient datapath encryption, which is particularly beneficial within the cloud data plane. Given the sensitive nature of data that needs transit protection, this encryption strategy is paramount. Cilium Transparent Encryption's distinctive advantage lies in its kernel-level operation. This unique feature provides it with a significant edge in performance over user-space solutions.

Figure 3: Choreo project trust boundaries

Going beyond the above mentioned policy enforcement, we are in the process of rolling out fine-grained access control policies based on the identity verification support that comes with Cilium.

A significant advantage of Choreo is its focus on application development governance. In the process of building an application, there might be a need to interact with external endpoints. However, due to corporate policies, business sensitivity, or criticalness of the data, it might be essential to restrict interactions to only approved endpoints. Choreo will ensure this level of governance through visual diagrams during project design.

When the application architecture or a developer establishes these governance requirements, Choreo effectively translates them into Cilium DNS-based network policies. These policies provide a mechanism to specify access control, while Cilium takes care of the more complex task of tracking DNS to IP mapping.

Figure 4: Control egress with DNS-based network policies

In the future, beyond simply controlling external egress, Choreo's governance framework will ensure rigid control over unauthorized access between components within a project. This is particularly pertinent when such interactions are not defined in the contract (for instance, an open API specification) or not envisaged by the architect or developer during the component design phase.

Choreo translates these design principles into Cilium L7 network policies. These policies enforce resource-level authorization along with component labels (assigned at the time of deployment to PODs) that are incorporated into network communication using eBPF. For a more detailed understanding of this process, you can refer to my previous article, "Unlocking the Power of Programmable Data Planes in Kubernetes with eBPF [5]."

Figure 5: L7 Policies for resource-level authorization

This level of control further underlines Choreo's commitment to the Zero Trust security model. This holds true even if the interaction originates from a component within a project, which might otherwise be considered as fairly trusted. This granularity in access control reinforces the security of our platform, ensuring a robust and reliable environment for our users' applications.

Mitigating the Risk of Container Escape

Choreo gives developers flexibility to build applications in various programming languages. These applications are subsequently compiled and packaged into Docker containers via the Choreo CI/CD pipeline and deployed into the runtime Kubernetes cluster. The Choreo CI/CD pipeline is designed to ensure that all built Docker images are in alignment with Choreo's security guidelines.

Choreo further augments its defenses by assigning distinct identities to each user application's container deployment within the platform. These identities undergo a continuous process of verification and validation of their interactions, which we discussed earlier. Under this model, even if an attacker manages to exploit a vulnerability and perform a container escape, their journey does not end there. They would still be required to clear subsequent authorization checks to access resources on the host or other containers. This multi-tiered verification process acts as an additional line of defense, making it considerably more challenging for potential attackers to breach our system.

Observability in the Context of Zero Trust Security

Remember, Zero Trust is not just about preventing unauthorized access, it's also about being able to identify when something goes wrong and respond appropriately. That's where security observability comes into play, providing the insights and understanding needed to keep your systems secure.

Observability at the project level in Choreo provides a dynamic service graph, which visualizes all network activities captured by Cilium leveraging the power of eBPF. This service graph serves as a crucial tool for validating all runtime network activities against the design time definitions laid out during project planning.

An unexpected network call appearing in the runtime service graph could serve as an early warning of potential malicious activities. This real-time verification capability, which ensures runtime behavior aligns with initial design principles, is a significant feature of the Zero Trust security model implemented in Choreo. Through continuous observation and real-time adjustments, Choreo provides a secure environment, promoting adherence to project specifications while effectively guarding against unauthorized or unexpected actions.

Figure 6: Choreo project service graph

Choreo upholds a comprehensive logging strategy, capturing every action at multiple levels. Whether it's applications, API gateways, Kubernetes events, access logs, ingress logs, or audit logs – all are diligently tracked and stored. These audit logs record every access attempt and its specifics (who accessed what and when).

In our quest to uphold the Zero Trust security model, we recognize the importance of comprehensive and intelligent analysis of our security data. To achieve this, a security information and event management (SIEM) system is crucial. Choreo's integration with advanced SIEM solutions like Azure Sentinel, enables us to collect, aggregate, and analyze security data from across our platform.

With SIEM, we can correlate data from different sources like application logs, API gateways, Kubernetes events, access logs, ingress level logs, and audit logs, enhancing our ability to identify patterns, detect anomalies, and respond to potential threats in real-time. This kind of in-depth analysis supports our 'never trust, always verify' approach, allowing us to continuously monitor and validate activities on our platform.

Figure 7: Private data plane security architecture (in Azure)

Figure 7 presents an overview of how we have strategically implemented various security controls across the private data plane that runs in Azure. These controls are designed to address diverse aspects of our platform's operation. We have similarly implemented security controllers for private data planes on AWS and GCP clouds.

For instance, our controls cover the runtime environment of user applications, ensuring that all running applications adhere to the strict security protocols established by the Zero Trust model. Moreover, our security controls extend to the realm of site reliability engineering (SRE) operations. They serve as steadfast guardians during maintenance activities, making sure that operational integrity is preserved even during system upgrades or troubleshooting activities.

In essence, our layered security approach, as shown in Figure 7, ensures that every facet of our platform is under constant security scrutiny.

Zero Trust and Compliance

Implementing the Zero Trust security model within Choreo significantly bolsters our SOC2 compliance posture. It is noteworthy to mention that WSO2 currently holds a SOC2 Type1 certification, and we are working towards acquiring the Type2 certification.

The core principles of the Zero Trust model, including multi layered authentication, least privilege access, encryption, and perpetual logging and monitoring, align with the stringent operational and security control requirements of a SOC2 audit.

By adhering to the Zero Trust model, we not only reinforce our platform's defense mechanisms but also improve our operational integrity, underpinned by our SOC2 compliance. This ensures that our customers’ applications are safeguarded by a platform that prioritizes security reliability and strict adherence to industry standards.

Summary

The following key takeaways summarize Choreo's application of the Zero Trust security principles and show how they are woven into the platform:

Never Trust, Always Verify: Choreo embodies this via multi-tiered security controls: authentication and authorization validate entity legitimacy, API gateways regulate access to resources. Continuous monitoring, comprehensive logging, and integration with Azure Sentinel facilitate real-time behavior tracking and anomaly detection.
Verify Explicitly: Choreo does this through its usage of API gateways for regulating external and internal traffic, ensuring every access request is authenticated and authorized.
Least Privilege Access: This is embedded in the architecture and is implemented through role-based authorization for APIs, fine-grained access control based on user-defined connectivity specifications, and strict control over access through gateways.
Micro-Segmentation: The platform uses a cell-based architecture and deploys each project in a distinct Kubernetes namespace, effectively creating micro-segments with independent deployments and isolated communication scopes.
Layered Security Controls: Choreo applies multiple layers of security controls like enforcing network-level control through Cilium network policies, and implementing MFA powered by Asgardeo
Assume Breach: Choreo operates under the assumption of potential breaches and emphasizes the necessity for real-time network observability, comprehensive logging, and anomaly detection using Azure Sentinel. This allows the platform to promptly identify and respond to potential threats.
End-to-End Encryption: The platform ensures end-to-end encryption with Cilium Transparent Encryption, providing efficient datapath encryption.
Full User and System Visibility: Choreo provides a comprehensive logging strategy, capturing activities at multiple levels, including application activities, API gateways, Kubernetes events, access logs, ingress logs, and audit logs. It also employs observability at the project level to visualize all network activities, reinforcing the 'never trust, always verify' principle.