How to Build a Production-Grade API — Kubernetes, Serverless, or iPaaS?
-
By Srinath Perera
- 14 Dec, 2021
Whether you are building a backend for a mobile app or website, creating a software-as-a-service (SaaS) product, or enabling enterprise integrations, you encounter application programming interfaces (APIs). APIs are not a new thing but, just like container technologies, they have become vital for successful digital initiatives over the past decade. Not only do APIs enable teams to connect microservices to each other and mobile and Web apps, they are also key to delivering digital experiences that are resilient, scalable, agile, and “always on.”
Two key architecture best practices have made APIs central to many modern architectures.
Firstly, APIs are the new Dynamic Link Libraries (DLLs). For years, we have accepted that we should reuse non-differentiating capabilities, and not drain our energy reinventing the wheel. After components and libraries, APIs have taken reuse to the next level through the rise of the API economy. This enables organizations to run on top of a much smaller footprint — drawing power from APIs like Stripe, SendGrid, Salesforce, Zendex, Workday, Hubspot, and Twilio — via pay-as-you-go models and with the power to scale as needed.
Secondly, as a part of API-first design, enterprises now need to internally represent key capabilities as APIs. This enables developers to rapidly recompose them to create value through digital experiences, which, in turn, increases agility and separates concerns among different APIs. Today's low- and pro-code API integration tools have democratized adoption by enabling developers of all skill levels to contribute to software development and delivery.
If you are building products or systems, it is likely that you will create an API sooner rather than later. If you are setting up a startup, the chances of building these are even higher. Also, given their wide adoption, most code will use APIs at some point.
APIs can come in many forms. The following are a few use cases that show them in action.
What does it take to build and get an API into production and achieve success? If you are building one, the following is a typical architecture flow.
While developing APIs, it is likely that you will use other APIs as well. In which case, you will need to discover those APIs; obtain security tokens; download, install, and understand their clients; and manipulate data between your programming language and JSON.
Development is not a one-time effort, and you will have to tweak and update your APIs. To do this, you need to have a continuous integration setup, which will leverage tools like Jenkins or Flux, to build, test, and deploy whenever you change your code.
Next, you will need to decide on our deployment setup (i.e., whether it is the company’s own hardware, infrastructure as a service [IaaS], serverless, or integration platform as a service [iPaaS]). You will need to focus on capabilities like high availability and autoscaling.
You will need to securely expose your API to the public, enabling users to register and subscribe to our APIs. You will also want to provide self-service to users as much as possible, e.g., password resets, updates to profiles, and changes to subscriptions. Security is a key requirement and teams now need to work with SSL certificates, security tokens, OAuth SSO configurations, and permissions. Often, setting these up involves an API manager, an API gateway, and an identity server.
When things go wrong or fail in production, you will also need capabilities to troubleshoot and fix these as soon as possible. At a minimum, you will need logs that are in one place and in searchable and correlatable form. You will also need telemetry from your machines and applications — and everything should be collected, indexed, and ready in case you need them.
Next, you will need metering and billing so that you can charge users. You will want to know whether your APIs are in use, by whom, when, and how often. We also need insights around our APIs and their KPIs such as retention, conversion, and activation rates, so that we can nurture our users and improve our product.
Even when we get all these done, the system will not stay constant. We will need to roll in updates, improvements, and tweaks. We should also be prepared for bugs to pop up and we should fix these and get the fixes into production as soon as possible. However, our system is in production, and it is running and serving customers. So, how do we get the fixes over without breaking anything? How do we make sure everything is OK? In case we miss something, how can we fix it?
At a minimum, we will need a staging deployment, which is identical to production, where we push new changes first, test them, and make sure everything works. We also need the ability to revert to earlier versions, in case we miss something or if something fails in production.
To get an API to production and to keep it there, we need to do all this, do it right, and keep it going with precision.
Having seen a glimpse of the journey, if you are still undeterred, you have three choices to get this system up and running.
Let’s consider each of these approaches.
In this example, we are looking at how we continue to develop, build, test, and manage our API beyond a minimal viable product (MVP) and then evolve our API beyond production. In this scenario, we will combine a few existing APIs to build a delivery product API that can be used by specialized delivery partners for a major supermarket.
We have decided that while we do not want to manage a virtual machine-like infrastructure, we would like to have some control over the underlying infrastructure. Kubernetes has become the de facto container orchestration and management platform owing to rapid growth and adoption.
Following our evaluations, we have gathered that we need to have a good understanding of how Kubernetes works. Secondly, we also need to have the right skills and experience to operate Kubernetes as our infrastructure environment. Our best option would be to go with a managed Kubernetes environment, where we do not have to have extensive skills and knowledge around managing it.
Early in the development process, we decided that it is much easier and efficient to use a framework for developing our API. Therefore, we decided to use FastAPI, which is a Python framework that helps you to build them. Furthermore, we needed to use a couple of external APIs, such as SMS from Twilio, Payment from Stripe, etc. We also verified that these API products have full support for Python; therefore, choosing Python as the product language makes sense.
While we did not have to learn too much about how the underlying Kubernetes infrastructure works, we had to learn a bit about development on Kubernetes as deploying an application to Kubernetes is declarative in nature. This was a new learning experience as most of us did not have any experience with Kubernetes before.
Firstly, we need to containerize the API we built with Python so that it can be deployed and run in Kubernetes. Secondly, we need to ensure that we follow Kubernetes’ declarative artifacts for environment configuration, secrets, and connectivity to other services that are coded in YAML and are recognized by the Kubernetes API. While getting the API to build as a container image was not hard, it took a while to get our head around learning how to keep configurations in Kubernetes and what best practices to use.
During testing, we needed to test external APIs, such as Twilio and Stripe, and the good news was that they provided sandbox API endpoints that we could use for test purposes. Unit testing was mostly done on our developer laptops. We only used the Kubernetes environment for integration testing, user acceptance testing, and end-to-end performance testing.
Where possible, we used automated testing built into our CI/CD tooling process as described below.
For testing the Delivery Product API, we decided we would use Postman, as it comes with sound documentation and examples of how to test APIs.
While we used kubectl, which is a command line interface for Kubernetes APIs, for initial deployments, it would not scale as we build, test, and deploy more frequently. Therefore, we decided to use Flux as our Continuous Integration (CI) and Continuous Deployment (CD) tool.
After reading the solution’s documentation, we were able to set up Flux in our Kubernetes environment. While the documentation was great, once again, without prior knowledge, it took us a while to get it all set up and in operation.
Once Flux was set up and configured the way we needed, it was easier for us to frequently build, test, and deploy into the Kubernetes environment. It was much easier and a better practice than using kubectl to deploy.
We discovered that managed Kubernetes infrastructure does not offer automated monitoring, logging, and alerting that is needed for us to observe how our Delivery Product API is performing in production. It is important to note that without observability we would not be able to react to any incidents and issues that may come up when others are consuming our API. Having logs, monitored metrics, and setting thresholds helped us to proactively take measures before they impact consumers.
Setting up all these takes time and effort; however, to speed up installing and managing software, we decided to use some of the managed offerings for logging, monitoring, and alerting.
Everyone knows it is important to have a secure environment to protect APIs and applications. While the managed Kubernetes environment offered some security, we still had to work on setting up security for managing the containers within the environment. These included, but were not limited to, securing container images prior to deployment, run-time security, network security and policies, etc.
In addition to the Kubernetes and container security, we also had to set up and manage security for our APIs such as authentication, authorization, denial of service attacks, etc. Therefore, to protect and provide a secure API, we had to set up an API gateway within the Kubernetes environment, configure it, and manage it ourselves. This task alone was quite challenging as we not only had to manage a secure API but also manage another supporting tool within the Kubernetes environment just to provide security for our API.
Just like any other products, API products must also evolve. In evolving our API, we needed tooling, insights, and other things that help us to monetize our API as a product and grow our business. We also needed to incorporate new attributes, e.g., possibly extending and integrating with others third-party APIs to help our consumers gain more benefits from using our API. All these things meant that we needed to continuously make improvements and evolve our API.
In terms of identifying where and how our API is performing from a business perspective, we needed statistics more than what is produced by the basic observability metrics. In other words, we needed API insights. We also needed a mechanism to monetize our API product. Therefore, we needed to incorporate a mechanism to capture usage and other factors to bill our consumer partners for the use of our API.
The following figure depicts the Kubernetes environment’s architecture.
As discussed above, we already saw the challenges we faced in getting our API production-ready. Selecting a managed Kubernetes platform was the easiest of all. Thereafter, we had to deploy and manage many products and tools just to expose our API securely. When evolving our API, we faced the following challenges:
Kubernetes provides greater flexibility compared to some of the other infrastructure we have used. However, it comes with a steep learning curve. While you can almost get away by using a fully managed Kubernetes service, such as with GCP’s GKE Autopilot, the implementation and maintenance of other products and tooling is an overhead and potentially slows things down when you expose your API as a product.
In this example, we are going to look at the use of AWS Lambda functions or serverless infrastructure to deploy and manage our API for the same use case as above. Why serverless? Because it enables us to focus more on building our API and less on managing infrastructure and related components.
In this example too, we decided to use a standard framework to build our API in order to reduce the time required to build and get our API production-ready. As with the previous example, we decided that we will utilize the Python framework’s FastAPI to build our API.
While AWS offers rich documentation, AWS Lambda was new to us and, therefore, we had to use external learning material and a few tutorials to understand how to create, build, and deploy services using the solution.
We also decided to keep our code repo in AWS itself by utilizing AWS Code Commit. This was not difficult as most of the development team was somewhat aware of how to use Git repos.
To set up these tools and environments, we used the AWS Console, which meant we had to learn Cloud Formation, Cloud Development Kit (CDK), or a third-party offering like Pulumi. Each of these options tied up our resources. While this was a simpler approach, later in the document you will find that this was not scalable as every step required a manual task.
While there are ways to build and test Lambda functions locally, we decided to use a development environment within AWS to make it simpler for our developers. However, we had to set up deployment tooling, which we will discuss later in this document, and set up an API gateway to call our API externally for testing purposes.
Initially, on a few occasions, we deployed the Lambda code manually; however, we soon realized it was not scalable owing to the frequency of our build and deployment. Therefore, we decided to go with AWS Code Deploy, which is another serverless service provided by AWS.
With the help of AWS documentation, internal tutorials, and some external material, we were able to set it up without much hassle.
Observability was a challenge because AWS does not really offer a fully-fledged observability platform. Instead, developers must rely on multiple AWS services for end-to-end observability. We ended up selecting and setting up the bare minimum set of products offered, such as AWS CloudWatch, CloudTrial, and X-Ray.
We were able to set up and run these products after reading their documentation, tutorials, and external material. However, getting to know each of them and learning how to use them effectively took time and effort.
While we tried simplifying identity and access management (IAM) for all the AWS products and services used (keeping basic roles and policies), it was not a simple task.
Over the years, AWS IAM has become quite complex and cumbersome to use. Of course, they would like to ensure that our environments, products, and services are highly secure. But it is not a straightforward task for a small startup.
Getting the API to production, offering the API to customers as a product, and evolving our API as needed required us to handle many details.
Unlike in the Kubernetes scenarios, where we had to deploy and manage all other required tooling and products, we could consume the serverless offering by AWS. As noted above, Code Commit, Code Deploy, Lambda, IAM, API Gateway, IAM, CloudWatch, CloudTrail, and X-Ray were some of the products we had to get to know and set up.
So what were the key challenges?
The following figure shows the overall architecture on top of serverless.
A serverless approach is less flexible and opinionated. However not having to manage some of the complex products, tools, and infrastructure was one of the biggest benefits. Unfortunately, while an AWS Lambda-based serverless approach may work very well on AWS, it may not be easily portable to another cloud provider. Like a Kubernetes approach, this does not give you tooling and support to easily manage your API in production. These are things you will need to consider when evolving and growing your API as a product.
An integration platform as a service (iPaaS) provides a lot out of the box, but it comes with reduced control. You need to accept an opinionated view of the API lifecycle in return for improved agility and speed to market. The iPaas landscape is still evolving, and you should verify an offering's features and capabilities against your use cases before choosing one.
An iPaaS typically supports a low-code environment, where you can select connectors from a marketplace, add them to a canvas, and wire them up, hiding all details about access tokens, and API clients. Some connectors will support OAuth-based registration and token acquisition, while others would need us to go and get a developer key and token manually.
Most iPaaS offerings support visual data mapping for data manipulation and wiring data from one connector to another, making the experience visual, intuitive, and fast.
A strong solution will ideally support AI-based mapping suggestions, where mapping is done for us, and we can edit as needed before accepting it.
Most would also provide AI support in the form of code suggestions and performance forecasts, which can speed development.
The platform would handle the build, tests, and deployment out of the box, in a click of a button, with some platforms enabling you to customize the build, tests, and deployments. Also, they would often support publishing your API securely and managing its lifecycle, throttling, billing, etc. Also, we want to manage subscriptions, by registering users, billing, and giving them different levels of access based on their subscriptions. Support for subscription management and support for developer portal comes with API management support.
iPaaS platforms support “dev” and “production” environments and the ability to revert to an earlier version, which are required for most production use cases.
The level of troubleshooting will significantly differ according to the iPaaS vendor. However, troubleshooting support is even more important as an iPaaS does not give us direct access to the underlying infrastructure. Therefore, we need to explore our options in detail.
Troubleshooting support should include searchable access to logs, support for telemetry, and debugging. Also access to profiling views, such as flame graphs, can save us a lot of time. A detailed discussion can be found at https://www.infoq.com/articles/observability-tools-future/.
Furthermore, most iPaaS products provide some level of insight to help with go-to-market efforts or provide integration with other analytics systems (e.g., integrating with customer journey analytics software).
The platform will let us bring in our own Identity Provider (IDP) to manage user accounts and their passwords and make APIs we publish work with the IDP.
iPaaS offerings provide features such as running multiple versions, reverting the application to older versions, and even canary deployment out of the box.
In summary, an iPaaS would not give the same level of flexibility as building your system from scratch; however, it handles most of the complexity and enhances agility and delivery times. However, the space is still evolving, and features vary according to each vendor. The following are some key capabilities to look for.
APIs have become a key part of modern architectures, and as a developer, chances are that you will build one soon. APIs are often exposed to the outside world, thus getting an API to production and keeping it running requires managing a number of things. As a prospective API developer, we have three implementation choices: build on top of Kubernetes, build on top of serverless, or build on top of an iPaaS.
Among the three, Kubernetes requires us to do the most amount of work, as we will be setting up a complex deployment. A managed Kubernetes deployment will help, but we still have a significant number of tasks.
Serverless, while hiding some details, still leaves a lot of details open. Furthermore, it quickly becomes expensive if the API must handle sizable traffic.
An iPaaS addresses most requirements, while significantly reducing time to market. However, if you are choosing an iPaaS, you need to verify that the offering meets all your requirements for successfully productizing an API — e.g., AI-assisted development, support for a marketplace, low-code functionality, troubleshooting, user journey analytics, the ability to work in different environments, and reverting to a prior version.
Conversely, an iPaaS provides less flexibility, forcing us to follow an opinionated model developed by designers, while Kubernetes provides us with more flexibility. As a developer, we need to balance these two aspects while considering timelines, skills, costs, and preferences.