Introducing the Bijira AI Gateway: Next-Gen AI-Driven API Management

The API ecosystem is rapidly expanding into the world of AI. Enterprises are increasingly integrating generative AI services like OpenAI, Claude, and AWS Bedrock into their workflows, but face challenges with secure, governed, and scalable integrations.

That’s why Bijira, WSO2’s AI-native API management SaaS platform, introduces AI Gateway support. This is a purpose-built solution to create, expose, and manage AI service integrations as first-class APIs. AI Gateway support is also available today in WSO2 API Manager

With the AI Gateway, developers can securely and efficiently expose LLM-powered tools and workflows to their consumers, while applying enterprise-grade governance.

What is the AI Gateway?


 

The AI Gateway is a specialized gateway in Bijira, designed for seamless integration with leading AI platforms like OpenAI, Azure OpenAI, Anthropic Claude, Mistral, and AWS Bedrock. It allows developers to create native APIs for these services directly from the Bijira Console, with advanced features like security, token-based rate limiting, semantic caching, and guardrails policies for content safety all out of the box.

Native AI API creation

Through Bijira’s intuitive Console, you can now provision and manage AI service APIs natively.

  • Select your preferred AI provider (OpenAI, Azure OpenAI, Claude, Mistral, AWS Bedrock).
  • Configure connection details, authentication, and invocation parameters.
  • Instantly create and expose a secure, governed API for your chosen LLM service.

Token-based rate limiting

AI services often incur costs on a per-token basis, making usage control critical. Bijira’s AI Gateway introduces token-based rate limiting that can be applied at the API level, allowing you to:

  • Set quotas based on LLM token counts instead of just request counts.
  • Prevent overuse and control costs.
  • Enforce fair usage policies for AI-powered APIs.

Guardrails

Content moderation and safety are paramount when integrating AI. Bijira’s AI Gateway comes with the following guardrails.

  • Basic guardrails
    • Regex guardrail - This policy provides an input validation mechanism that uses regular expressions (regex) to define what kind of text patterns are allowed or blocked.
    • URL guardrail - This provides the capability to perform URL validity checks on incoming or outgoing JSON payloads. It enforces content safety by validating embedded URLs for accessibility or DNS resolution.
    • Word count and sentence count guardrail - This provides the capability to perform count-based validation on incoming or outgoing JSON payloads.
    • Content length guardrail - This provides the capability to perform content-byte-length validation on incoming or outgoing JSON payloads.
  • Advance guardrails
    • PII Masking - This provides the capability to detect and obscure sensitive personal data like names, emails, phone numbers, SSNs, or addresses in the input or output of an AI-powered API.
  • Third-party integrations
    • Azure Content Safety Content Moderation - This provides the capability to integrate Azure Content Safety Content Moderation Service to filter out harmful content in request bodies and AI-generated responses.
    • AWS Bedrock Content safety - This provides the capability to integrate with AWS Bedrock Guardrails to enable real-time content safety validation and PII protection for AI applications.

Semantic cache

Semantic caching is an intelligent cache that stores previous prompt response pairs and uses similarity matching to serve responses for semantically similar queries, not just exact text matches. To optimize both latency and cost, Bijira provides semantic caching for AI responses to:

  • Detect semantically similar queries.
  • Reuse cached responses when appropriate.
  • Reduce repetitive token consumption and speed up response times.

From theory to practice

Let’s walk through a scenario that lets you understand how AI Gateway works in Bijira. You set up an AI API in Bijira, configure it with your preferred LLM provider (e.g., OpenAI), and expose a single, secure API endpoint that internal applications can call. This allows your teams to leverage AI capabilities seamlessly, while you maintain control over authentication, rate limits, caching, and guardrails.

Create an AI API from Bijira.

1. Let’s start by creating a new AI API through the Bijira console. 
Select API ProxyThird Party APIs (Egress) AI APIs.


 

2. Once you click the AI APIs, the list of LLM providers will be displayed. In this scenario, we are using OpenAI.


 

3. Provide the desired name and version and create an API.

Once you create an AI API, you’ll need to set the backend key configuration. This ensures the gateway authenticates with the AI service on behalf of clients. Go to Develop → Policy to configure backend security keys and click the endpoint configuration.


 

4. In the endpoint configuration section, provide the OpenAI key and value you received when you subscribed to the OpenAI API. You need to provide the header as Authorization and provide the key with the Bearer prefix as shown in the screenshot. This key can differ based on the LLM provider. 


 

After you create the API with the endpoint configurations, you can configure rate limiting based on tokens to control usage and costs effectively. 

Configure token-based rate limiting

LLMs and AI services charge and operate based on tokens consumed, not just the request count. Configuring token-based rate limiting ensures you can control costs, prevent overuse, and fairly enforce quotas on clients by limiting the total tokens they can consume over time, rather than just the number of API calls, which is more aligned with how AI usage is billed and measured.

1. Go to Develop → Policy and click Add API Level Policies.


 

2. Select Token Based Rate Limiting Policy and configure the following.

Key Usage
Max Prompt Token Count Limit on the number of tokens allowed in the prompt (input) per request.
Max Completion Token Count Limit on the number of tokens allowed in the completion (output) per request.
Max Total Token COunt Combined limit of prompt + completion tokens per request.
Time Unit The time window over which Usage is counted (per minute, per hour, per day).


 

After you’ve configured token-based rate limiting to control usage and costs, the next step is to configure guardrail policies to ensure the safety and reliability of AI outputs.

Configure guardrail policies

AI systems, especially LLMs, can generate inappropriate, unsafe, biased, or incorrect content if left unchecked. Guardrails act as safety controls to enforce rules, filter harmful or invalid output, protect sensitive data, and align AI behavior with organizational policies and compliance requirements — ensuring trustworthy and responsible use of AI in your applications.

1. Go to Develop → Policy and click Add Resource Level Policies.


 

2. You can configure any one of the guardrail policies and configure the required field to add it. For this use case, we’ll configure the Azure content safety guardrail.

Field Usage
Guardrail Name A unique, descriptive name to identify this guardrail configuration (e.g., azure-content-safety-check).
Azure Content Safety Endpoint The URL endpoint of the Azure Content Safety service that will evaluate the AI output for harmful or unsafe content.
Azure Content Safety Key The API key (or access token) used to authenticate requests to the Azure Content Safety service.

You can keep the other fields as optional.


 

Configure semantic caching

AI responses, especially from LLMs, can be expensive and slow, even for similar or identical prompts. A semantic cache stores previous prompt–response pairs and retrieves them when a semantically similar query comes in, instead of calling the LLM again.

For example, in an internal client application one user asks, “Show me today’s weather forecast for New York”, and later another asks, “What’s the weather like in NYC today?”. The AI detects the semantic similarity, fetches the cached response, and replies instantly. This makes the app feel faster and more responsive while reducing backend LLM calls.

1. Go to Develop → Policy and click Add Resource Level Policies and select Semantic Caching from the Policy menu.


 

Key  Usage
Embedding Provider AI provider to consume their available embedding model APIs (only Mistral and Azure OpenAI are supported right now).
Header Name

If an AI model requires authentication via authorization, api-key, or other necessary header, specify its name here.
Mistral => “Authorization”

Azure OpenAI => “api-key”

API Key API key for the desired embedding provider.
Embedding Model Name Desired embedding model to be used for generating embeddings.
Embedding Upstream URL: Endpoint URL for the embedding generation.
Dimensions The dimensionality of the vectors generated from the selected embedding model.
Threshold The similarity threshold for accepting semantic search results. Note: A value closer to zero indicates higher semantic similarity, while higher values represent weaker matches.
Vector Store Which vector database driver to use (Redis).

Why choose Bijira AI Gateway?

As enterprises race to adopt AI in their workflows, the challenges of managing secure, scalable, and governed AI integrations are becoming more pressing. Bijira’s AI Gateway is designed to help you embrace the power of LLMs and generative AI without compromising on governance, cost control, or user safety.

With out-of-the-box support for major AI providers, token-aware rate limiting, robust guardrails, semantic caching, and enterprise-grade security, the AI Gateway empowers your developers to deliver innovative AI-powered experiences with confidence.

Whether you’re building internal tools, customer-facing apps, or next-gen SaaS platforms, Bijira AI Gateway gives you the foundation to innovate responsibly and efficiently, making AI truly enterprise-ready.

Get started

Ready to see how Bijira helps to elevate your API integration experience through AI Gateway? Head over to our documentation to get started.