WSO2Con2025 Logo

March 18-20 | Barcelona, Spaain

 
2025/02/25
 
25 Feb, 2025

Navigating the Complexity of Internal Developer Platforms: A Guide for Technical Decision Makers

  • Lakmal Warusawithana
  • Vice President & Distinguished Engineer - WSO2 LLC

Among a growing number of enterprises, there is a pressing need to re-strategize product development and market approaches to align with digital-native products. This shift mandates that every company must become a software company. However, the reality of software development in the cloud-native era is far from simple.


Figure 1: The reality of software manufacturing in the cloud native era

This is where enterprises require a platform that undertakes all tasks, from writing business code to running it at scale in production.

Embracing the Internal Developer Platform (IDP)

According to the Platform Engineering Maturity Model defined by the Cloud Native Computing Foundation (CNCF), there are various types and levels of platforms, each addressing different maturity levels in platform engineering and focusing on specific tasks required to transition code into production. Among these, level 4 maturity in the CNCF model, which aligns with modern definitions of an internal developer platform (IDP), stands out as the ultimate solution to help enterprises maximize their productivity. As the productivity advantages of these platforms increase, so too does the complexity of building and maintaining them. In our own 18-plus years of experience working with enterprises, we have observed that nearly 60 percent of the digital transformation budget is consumed by projects requiring teams of 100 or more members for at least three years.


Figure 2: Comprehensive functionality of an IDP

An internal developer platform is responsible for a broad range of capabilities as noted in the diagram above. However, a common oversight in the development of IDPs is the focus solely on the continuous integration/continuous deployment (CI/CD) and runtime aspects, often at the expense of the software design and developer experience. This neglect can lead to difficulties when these systems are expected to integrate seamlessly into the full software development lifecycle. An effective IDP should not only enhance operational tasks but also provide tools and architectural support to enable the implementation of scalable, agile software design principles. By addressing both operational efficiency and design agility, an IDP can truly support the entire spectrum of software development, from conception through production.


Figure 3: Reference architecture for an IDP

Figure 3 illustrates a reference architecture for an IDP, where software design begins with a well-known domain-driven design (DDD) methodology. Cell-based architecture, which is the foundation for this reference architecture, provides a great blueprint to implement DDD within a cloud native environment, where domains or subdomains are organized into network-bounded cells managed through well-defined gateways. This structure enables smaller, agile teams, often the size of two-pizza groups, to perform frequent releases.

The reference architecture incorporates several key capabilities: A CI/CD infrastructure helps avoid operational bottlenecks. Built-in resilience techniques and policies within the platform enhance the robustness of architectures, such as microservices. Implementing comprehensive observability is crucial for providing essential debugging capabilities. Technologies like service mesh will be instrumental in advancing these efforts. Adherence to Zero-Trust principles ensures necessary isolation and secures data for all user applications. Incorporating a robust service discovery system will foster collaboration among multiple teams while ensuring governance. Additionally, supporting an API-first development approach will streamline integration and functionality across services.

At the same time, an important strategy to consider is avoiding lock-in with a specific cloud vendor, aiming instead for a cloud-vendor-agnostic solution whenever possible. This approach enables organizations to reap the benefits of multi-cloud deployments utilizing different cloud providers. Moreover, it is crucial for maintaining business continuity, especially since there have been instances where a cloud provider’s widespread outage has impacted deployments, even those supported across multiple regions.

A Reference Implementation

A significant challenge is choosing the appropriate cloud native tools that align with the internal developer platform’s needs. The landscape of cloud native technology is complex, requiring platform engineers to invest considerable time in evaluating the essential toolset and developing proficiency in these tools and technologies. Figure 4 describes a reference implementation that leverages more than 15 cloud native tools and technologies, many of which are CNCF projects.


Figure 4: Reference implementation architecture for an IDP

The IDP enhances the developer experience by managing business application code via a Git repository. Using Argo Workflow and Google buildpacks, it supports various programming languages to build container images. These are then uploaded to the Harbor registry, scanned for security with tools like Trivy, and the resulting Kubernetes artifacts are deployed within a Kubernetes cluster following best practices.

A cell, akin to a Kubernetes namespace, is secured with Cilium network policies. Cilium acts as both a Container Network Interface (CNI) framework and service mesh, using eBPF technology to enhance network security and observability, restrict unauthorized traffic, and allow only specific gateway or internal traffic. Additionally, all network traffic is encrypted with WireGuard, optimized by eBPF for efficient routing.

Externally accessible services are secured by an Envoy-powered API gateway that authenticates and authorizes traffic, adhering to the zero-trust principle of ‘never trust, always verify.’ For effective zero-trust isolation, microsegmentation, which is inherent to the cell architecture, plays a crucial role.

The cell gateway underpins API-first development and aids in service discovery and governance, aligning with DDD and microservices principles. It enhances collaboration among autonomous teams.

Cilium’s eBPF-powered Hubble metrics and Envoy collect comprehensive network metrics, which Prometheus processes in-cluster for efficient troubleshooting. Fluentbit, integrated as a Kubernetes daemon set, gathers container logs sent to OpenSearch for enhanced querying and troubleshooting. Together, Cilium and Envoy enforce resilience strategies like automatic retries and support deployment methods, such as canary and blue/green deployments.

Build vs. Buy

As discussed, building an IDP is a resource-intensive endeavor, often requiring at least three-plus years and a dedicated team of more than 100 members. Securing platform engineers with the appropriate skill set presents another substantial hurdle. Yet, for organizations equipped with ample time, financial resources, and technical expertise, creating a bespoke IDP can greatly enhance productivity in digital product delivery due to its tailored fit to specific needs.

Organizations lacking sufficient team size, skills, time, or budget might find that building an IDP diverts focus from core business objectives, risking failure in platform development and broader business goals. Opting for an IDP or IDP-as-a-service can boost productivity with minimal investment and allow for a smaller engineering team, as these solutions enhance efficiency. However, there may be challenges such as needing to modify existing processes or discard prior software to fully benefit from the SaaS solution. Thus, considering the extensibility of the acquired IDP service is crucial.

Ultimately, whether to build or buy an IDP should be determined by an organization’s available resources, strategic imperatives, and the immediacy of its digital transformation needs.

Try Choreo

Choreo is an advanced internal developer platform, aligning seamlessly with platform engineering principles to boost productivity, scalability, and security in software development.

Its fully managed environment eliminates the burden of maintenance, updates, and scaling, enabling platform engineers to focus on optimizing developer workflows. By ensuring infrastructure reliability, security, and performance, Choreo empowers teams to drive innovation and strategic projects without the complexities of platform management. Sign up today and try out Choreo for free.

English