8 Feb, 2023 | 3 min read

Architecture in the Age of APIs

  • Srinath Perera
  • Chief Architect - WSO2 Inc.

Photo by Waldemar on Unsplash

This is an extended version of articles 1 and 2 published in TNS.


We’re entering an age where many organizations are trying to increase productivity by transforming themselves into digital organizations. Digital organizations are built on top of digital architectures that unlock agility across the enterprise. Agility, which is the ability to respond to changes and new needs fast, enables most of the other advantages such as shorter times to market, better customer experiences, and competitive advantages in the long run. For example, agility leads to faster time to market than the status quo, and often ensures first-mover advantage. Furthermore, it enables organizations to learn from, respond to, and adapt to changes, evolving their offerings into what end-users want, which can significantly reduce risks of failure. Furthermore, the ability to adapt faster than your opponent is a significant competitive advantage, and it can be used to counter most of an organization's weaknesses by evolving out of them. Hence, often the ability to adapt by itself can help an organization to win in business, regardless of an inferior initial product or late entrance to the market. This is also called Digital Darwinism. 

It’s not enough to just do some things fast, the quality of decisions is even more critical. Techniques such as growth hacking and growth engineering let us systematically evaluate user behaviors and improve the system to meet user expectations. However, even techniques such as growth hacking that take an experimental approach for better decisions are only possible if we can try things fast. Thus, agility is a precondition at that step too. 

What limits the agility of organizations? Dependencies limit agility. If developers must wait for each other, it slows down the momentum, delays decisions, increases context-switching costs and zaps motivation. 

One critical dependency is dependencies between developers in the team. 

To reduce such dependencies, we try to define subdomains (e.g., bounded contexts in domain driven design [DDD]) and expose the capabilities of the subdomains as APIs. APIs enable the use of services by people outside the team who build them, while encapsulating implementation details. The term API was first used by the operating system and library interfaces but is now often used with networked services. Systems can only reuse libraries and software components within matching environments (e.g., the same program language, no dependency conflicts, etc.), while networked APIs enable reuse on a different level where any system can use them without worrying about programming language configurations, setup, or infrastructure for running the code. Hence APIs are much more reusable than libraries or software components. 

While APIs help, their implementations can depend on each other (e.g., shared libraries, datasets, etc.), which creates dependencies. Microservices architecture enables APIs to evolve in a loosely coupled manner where they can add changes, deploy, and operate while depending only on other APIs. 

Applications (mobile apps, web apps, user APIs, and end users integrations) combine these internal and external APIs to deliver business value. However, if those applications often require changes in underlying APIs they depend on, it kills agility. So, we must design those core APIs (domain services) to be flexible to future requirements when possible. We want core APIs and applications to evolve independently. 

An API economy, by enabling the reuse of APIs across organizations, also enables applications to move fast, thus enhancing agility. 

To understand this in detail, let us look at the typical architecture of a digital business. 

Sample Architecture of a Digital Business 

The following sample architecture depicts how to arrange internal capabilities as APIs and how to create a digital business by combining and creating new and existing APIs. Although this is not the only possible architecture, it is representative of common techniques and ideas. 

The above architecture breaks the code into two layers: a core API (capability) layer and an experience layer (application layer). The core API layer represents the capabilities of the organization as APIs (aka core APIs). 

The experience layer is what end-user applications use directly. There are websites, mobile apps, and APIs exposed to the end-user applications and helper services such as backend for frontend (BFF) services. They reuse core APIs and external SaaS APIs from the API economy. The experience layer creates value by combining new APIs. APIs in this layer may be consumed by end-users or by web or mobile apps. This layer may also include event-driven integrations (logic that connects systems, data, and APIs together), which are triggered by events occurring in the world or events happening in the system. 

In a well-designed architecture, the core API layer should evolve slowly and is built by experienced developers employing sophisticated techniques. In contrast, experience layers evolve fast in response to user requests. The goal of this layer is to compose the API from the core API layer and public SaaS APIs to deliver dynamic user experiences. This layer requires close collaboration with domain experts and product managers. Accelerating go-to-market and the ability to change while learning from users are critical (e.g., growth hacking). 

Why does the experience layer evolve faster than the core API layer? When we need to make a change or add a new experience (e.g., UI), some can be handled by reusing APIs without changing them. However, some experience layers will need API changes. But not all. If such change is needed, the resulting coordination will often add weeks, if not months, to the change, killing agility. A good design aims to keep APIs stable with minimum changes and reduce the need for applications to wait for core API changes. So, the core API layer will change slower than the experience layer. The better the API design, the slower the core API layer will change.

Services running in experience and core API layers need several helper technologies. The first must support user identity and access control, which is supported by CIAM technologies. Second, they need API management to expose their services as APIs, handle their subscriptions, throttling, and other quality of service (QoS) characteristics and provide support for developers who will use those APIs (typically via a developer portal and sometimes through a marketplace as well). It is possible to build or assemble both CIAM and APIM using libraries and code, but often the attention, time, and money spent building those are wasted, taking you away from your core business goals. 

It is not enough to just get the system to production, we need to keep our system running and support it. To achieve this goal, we need DevOps and troubleshooting support. 

DevOps includes a code repository and a CI/CD pipeline to build and deploy changes and multiple environments (typically dev, staging, and production) to test and gradually roll out the code. Some deployments have support for canaries and blue-green deployments to support the gradual rollout and immediate fallback when a problem occurs. 

Troubleshooting is a set of tools that help the support team isolate and fix any problems. At a minimum, this must include observability tools, log management, and support to collect data from a running system. 

How APIs Enable Agility

Let’s revisit how APIs enable agility in greater detail. 

As discussed, APIs document capabilities and encapsulate the details, enabling different teams to work together without understanding their internal workings. Microservices enable corresponding teams to work in a loosely coupled manner, reducing the need to wait for each other. Furthermore, APIs hide legacy and existing systems enabling teams to work without understanding them in detail. 

API marketplaces can help discover internal and external APIs, increasing reuse and enabling teams to get more done faster. API marketplaces can also manage credentials, removing the time needed for personal negotiations before using an API, thus saving time. 

Well-documented public APIs enable customers and partners to innovate too. Well-designed APIs unleash the creativity of customers and partners without the organization has to be in the loop. This increases self-service, which can reduce customer support requests and feature requests. 

As discussed under digital business architecture, coupled with Integrations, APIs can be recomposed to rapidly support new user experiences or improve experiences, enabling the organization to get ideas and features to market faster, thus increasing agility.

Furthermore, coupled with low-code development, APIs enable teams having basic IT expertise to build their applications and integrations themselves without having to wait for a central IT team, removing key challenges in most businesses. 

How to Build an API-Centric Digital Architecture? 

Let’s explore how we can implement such an architecture and tools of choice. 

We need to decide where we can run the system, how we are going to implement the services, and how we are going to expose, manage, and govern relevant services as APIs. 

We can identify two types of services based on their capabilities: business logic services and data services. Data services are built with wizards or using a programming language, such as Java, Go, or Ballerina. Based on the features required, you may choose GraphQL, OData, or pure JSON with HTTP. Often, wizards or a low-code/no-code experience let users explore the database using SQL or other queries and map the results into messages. Most business logic services would modify a database, retrieve data, and run some business logic before responding. They’re often built with languages such as Java (+Spring Boot), Go (e.g., go-kit), and Ballerina, which are strongly typed, have solid concurrency models, and are fast. A notable exception from the above languages is that when the API exposes a machine learning model, users may choose the Python flask framework or TensorFlow Serving. 

Experienced developers often write the services in the core API layer. Those services can run on top of on-premises hardware, in the cloud, or on top of a serverless platform. The serverless option can be cost-prohibitive for heavy loads. Services in the core API layer often need to connect to existing databases, systems, or other services in the same layer. They communicate with HTTP, GRPC, and sometimes message protocols such as AMQP or Kafka. 

You may choose to expose some of these services as APIs via an API management solution. The experience layer can directly use services in the Core API layer as is or as APIs. 

The experience layer includes APIs, services (e.g., BFF services or end-user APIs), integrations, and also web apps or mobile apps. These applications and services compose other core API layer APIs, services, or SaaS APIs. Since the experience layer often focuses on the composition of APIs, it’s mainly built on a connector ecosystem and data mapping capabilities: connectors provide easy-to-use clients to talk to core APIs, services, and SaaS APIs, and data mapping lets us translate between different message formats. A typical composition API will receive a message directly or through an event, invoke several APIs, and translate data between those APIs using data mapping. The experience layer is often implemented using an iPaaS in the cloud, an on-premise integration tool, or using a programming language like Ballerina or python, or code running serverless. Most of these choices have a connector ecosystem and support data mapping. Low-code technology used for writing integration (server side code) is a good choice for the experience layer as it enables rapid development, involvement of domain experts, and a wider workforce.

Most services in the experience layer are typically exposed as APIs. If you choose an iPaaS or a cloud platform (e.g., Azure functions) for deployments, each provides APIM support, observability, log management, and sometimes CIAM support. Typically, solutions provided by non-cloud providers tend to be deeper in their features, but you should choose based on your requirements. Among common APIM solutions are WSO2 API Manager, Apigee, or Mulesoft’s API Manager. 

Most public APIs, mobile applications, or websites provided by the experience layer must support end-user management, authentication, and authorization, which we should solve with a CIAM solution. You can choose between both on-premises as well as SaaS CIAM solutions (e.g., WSO2 Asgardeo, Okta, or Auth0). 

Instead of directly building from scratch or building on top of a cloud provider at the IaaS level, you can choose to build on top of an internal developer platform like Choreo that integrate the above aspects. However, the selected platform must take an opinionated approach to design decisions and reduce your choices and flexibility in return for simplicity and more agility. 

Digging Deep, Understanding Inherent Challenges

APIs and its widespread reuse and composition poses several architecture challenges. 

The first is handling API limits. Most APIs impose usage limits to the user, such as a limit on the number of requests per month, and requests per minute. One third-party API can be used by many parts of the system. Then, to handle subscription limits, the system needs to track all API calls and raise alerts if the limit will be reached soon. Often increasing the limit needs human involvement, and the system needs to raise alerts well in advance. Furthermore, the system must track API usage data persistently to preserve the data across service restarts or failures. Moreover, if the same API is used by multiple applications, collecting those counts, and making decisions needs careful design. 

Handling rate limits is more complicated. If handed down to the developer, they will invariably add sleep statements, which will solve the problem short term, but in the long run, would lead to complicated issues when the timing changes. A better approach is to use a concurrent data structure that limits rates. Even then, if the same API is used by multiple applications, controlling rates is much more complicated. One option is to assign each API a portion of the rates, but the downside is that some bandwidth will be wasted because while some APIs are waiting for capacity, others might be idling. The most practical solution is to send all calls through an outgoing proxy with which we can handle all limits. 

Apps that use external APIs will almost always run into this challenge. Even internal APIs will have the same challenge if many applications use them. If one API is used by one application, there is little point in making that an API. It may be a good idea to try to provide a general solution handling subscription and rate limits. 

The second significant challenge is handling high latencies and tail latencies. Given a series of service calls, tail latencies are the few service calls that take the most time to finish. If tail latencies are high some of the requests will take too long or timeout. If API calls happen over the internet, tail latencies will continually get worse. When we build apps combining multiple services, each service adds latency. When combining several services, the risk of timeouts increases significantly. 

Tail latency has been widely discussed, but we will not repeat. However, it is good to explore and learn this area if you plan to run APIs under high-load conditions. (See [1], [2], [3], [4], and [5] for more information). 

Why is that a challenge? If the APIs we expose do not provide SLA guarantees (e.g., 99% percentile in less than 700ms) it would be impossible for downstream apps that use our APIs to provide any guarantees. Unless everyone can provide and stick to reasonable guarantees, the whole API economy will come crashing down. Newer API specifications, such as the Australian open banking specification, already define latency limits. 

There are several potential solutions. If the use case allows it, the best option is to make tasks asynchronous. If you are calling multiple services, it inevitably takes too long, and it is often better to adjust expectations by promising to give back the results when ready rather than forcing end-users to wait for the request and run the risk of timeouts. 

When service calls do not have side effects (e.g., search) there is a second option: latency hedging, where we start a second call when the wait time exceeds the 80th percentile and respond when one of them has returned. This can help control the log tail. 

The third option is to try and complete as much work as possible in parallel by initiating the service calls without waiting for their responses. This is not always possible because some service calls might depend on the results of earlier service calls. Furthermore, the code that calls multiple services in parallel, collects the results, and combines them is much more complex than the code that calls services one after the other. Hence the third option increases the burden on the programmer. 

When a timely response is needed, you are at the mercy of your dependent APIs. Unless caching is possible, an application can’t work faster than any of its dependent services. When the load increases, if the dependent endpoint can’t scale while keeping the response times within the SLA, we will experience higher latencies. If the dependent API can be kept within the SLA, we can get more capacity by paying more for a higher level of service or by buying multiple subscriptions. When that is possible, keeping within the latency becomes a capacity planning problem, where we must keep enough capacity to manage the risk of potential latency problems. 

One other option is to have multiple API options for the same function. For example, if you want to send an SMS or email, there are multiple options. However, it is not the same for many other services, but it is possible as the API economy matures, there will be multiple competing options for many APIs. When multiple options are available, the application can send more traffic to the API that responds faster, giving it more business. 

If our API has one client, then things are simple. We can let them use the API as far as our system allows. However, if we are supporting multiple clients, we need to try to reduce the possibility of one client slowing down others. Indeed, this is the same reason other APIs will have a rate limit on their APIs. We should also define rate limits in our API’s SLA. When a client sends too many requests too fast, we should reject their requests using a status code such as HTTP status code 503. Doing this communicates to the client that it must slow down. This process is called backpressure, where we communicate to upstream clients that the service is overloaded, and the message will eventually be handed out to the end-user. 

If we are overloaded without any single user sending requests too fast, we must scale up. If we can’t scale up, we need to reject some requests. It is important to note that rejecting requests, in this case, makes our system unavailable, while rejecting requests in the earlier case where one client going over their SLA does not count as unavailable time. 

Another source of latency is the cold start time, the time for the container to boot up, and service requests. One simple solution is to keep one replica running at all times, and for high-traffic APIs, this is very much acceptable. However, if you have many low-traffic APIs, this could be expensive. In such cases, you can guess the traffic and warm up the container before (using heuristics, AI, or both). Another option is to optimize the startup time of the servers to allow for fast bootup. 

Latency, scale, and HA are closely linked. Even a well-tuned system would need to scale to keep the system running within an acceptable latency. If our APIs need to reject valid requests due to load, the user will feel that the API is unavailable. 

Another challenge we will face is how to manage transactions across multiple APIs. If you can run all code from a single runtime (e.g., JVM) we can commit it as one transaction. For example, pre-microservices era monolithic applications could handle most transactions directly with the database. However, as we break the logic across multiple services (hence multiple runtimes), we cannot carry a single database transaction across multiple service invocations without doing additional work. One solution for this has been programming language-specific transaction implementations provided by an application server (e.g., Java transactions). Another is using WS-Atomic Transactions if your platform supports it. Additionally, you may use a workflow system (e.g., Ode, Camunda), which has support for transactions. You can also use queues and combine database transactions and queue system transactions into a single transaction through a transaction manager like Atomikos. This topic has been discussed in detail under microservices, and we will not repeat those discussions here. Please refer to [6, 7, 8] for more details. 

Finally, with API-based architectures, troubleshooting is harder. It is important to have enough tracing and logs to help you find out whether an error is happening on our side of the system or the side of third-party APIs. Also, we need clear data we can share if we need help from their API provider to isolate and fix the problem. 


A digital business enables agility, which enables organizations to go to market and respond to changes faster. For example, when we develop a product, we work with a limited understanding of the end-user needs. Reducing that risk is hard and expensive (e.g., market research, beta users, expert input). Instead, an agile organization can start with a reasonable approximation, learn from users, and evolve the system into what users really need. Just like with species, the ability to evolve is a key competitive advantage. 

APIs are key enablers of agility. We discussed a typical architecture that enables a digital business, how such an architecture could be realized and several new challenges due to the widespread use of APIs.

Choreo is a platform-as-a-service (PaaS) that supports the needs of an API-centric digital architecture, such as creating services, exposing them as services, reusing those services, supporting security, observability, and troubleshooting, etc. Using Choreo, we have tried to address many challenges discussed in the article. Furthermore, we look to provide a consistent and integrated experience to the user, enabling them to get their app from ideas to production in hours or days, not weeks or months. If you are looking for a platform for building your digital architecture, visit Choreo today at https://wso2.com/choreo/


  1. https://accelazh.github.io/storage/Tail-Latency-Study 
  2. https://medium.com/star-gazers/budgeting-randomness-163a4dbe77f4 
  3. https://www.usenix.org/system/files/conference/atc18/atc18-li-zhao.pdfhttps://www.usenix.org/sites/default/files/conference/protected-files/srecon19apac_slides_plenz.pdf 
  4. https://www.weave.works/blog/a-tale-of-tail-latencies 
  5. https://www.section.io/blog/preventing-long-tail-latency/
  6. https://developers.redhat.com/blog/2018/10/01/patterns-for-distributed-transactions-within-a-microservices-architecture
  7. https://www.baeldung.com/transactions-across-microserviceshttps://medium.com/javarevisited/managing-transactions-spanning-across-microservices-ccfd7c8a6e42 
  8. https://microservices.io/patterns/data/saga.html

I would like to appreciate thoughtful feedback from Frank Layman, Eric Newcomer, and others which significantly shape the article.