LLM Provider and Proxy - AI Policies for privacy and cost control¶

Sample source: wso2/api-platform-samples/llm-cost-control-and-privacy-control
git clone https://github.com/wso2/api-platform.git
cd api-platform/samples/llm-cost-control-and-privacy-control

Overview¶

This guide is for platform engineers and API developers who want to add resilience, cost control, and data-safety policies to AI-powered applications — without changing application code.

When your application routes LLM traffic through the WSO2 AI Gateway, you can attach policies that handle failover, caching, and PII protection at the gateway layer. This sample sets up a local Docker stack that demonstrates all three policies working together against an OpenAI-compatible endpoint, giving you a reproducible environment to understand and test each behavior before deploying to production.

What You Will Learn¶

By working through this sample you will understand how to:

Enable semantic caching — serve repeated or semantically similar questions from a Redis cache, reducing latency and API cost
Apply PII masking — strip sensitive identifiers (emails, phone numbers) from request payloads before they reach the upstream LLM provider

Scenarios Covered¶

Scenario 1 — Semantic Cache¶

Problem: Identical or near-identical questions (paraphrased, reworded) are sent to the LLM repeatedly, incurring cost and latency on every call.

What this scenario does: The gateway generates an embedding of each incoming prompt using Mistral and stores the LLM response in Redis. If a subsequent request is ≥ 85% semantically similar to a cached prompt, the cached response is returned immediately — no LLM call is made.

Scenario 2 — PII Masking¶

Problem: Users sometimes include personal data (email addresses, phone numbers) in prompts. Sending this data to a third-party LLM provider may violate privacy policies or compliance requirements.

What this scenario does: Before the request leaves the gateway, a regex-based redaction policy replaces detected email addresses and phone numbers with masked placeholders. The upstream model never sees the original values.

Expected Results¶

After running the test scripts you should observe the following for each scenario.

Scenario 1 — Semantic Cache¶

The same question is sent twice. On the second request, the gateway should return a cached response. The test detects a cache hit via:

A HIT value in any X-Cache* response header, or
The second response arriving ≥ 3× faster than the first (LLM baseline is typically > 500 ms)

[PASS] Semantic cache: cache hit detected (header match / response time)

If embedding_provider_api_key is not set in additional-config.toml, cache lookups silently fall through to OpenAI and the test will warn rather than fail.

Scenario 2 — PII Masking¶

A prompt containing a unique email address and phone number is sent, asking the model to repeat them verbatim. Because the gateway redacts the values before forwarding the request, neither the original email nor the phone number should appear in the response.

[PASS] PII masking: original values not found in response

Prerequisites¶

Tool	Purpose
Docker + Docker Compose	Runs the gateway stack
`wget`	Downloads the gateway distribution
`unzip`	Extracts the distribution
`python3` + `pyyaml`	Used by setup scripts to merge YAML/TOML files
`curl`	Calls the gateway management API and proxy endpoint
`jq`	Used by `test.sh` to parse API responses (`brew install jq`)

Required Configuration¶

1. OpenAI API Key¶

The setup script injects the key into the LLM provider at deploy time. Provide it via:

# Option A — environment variable (recommended)
export OPENAI_API_KEY="sk-..."

# Option B — script argument
./setup.sh sk-...

# Option C — interactive prompt (key is hidden)
./setup.sh

The key is never written to disk; it is substituted into the provider payload at runtime and discarded.

2. Mistral API Key (required for Scenario 2)¶

Open additional-config.toml and fill in your Mistral key:

embedding_provider_api_key = "your-mistral-api-key"

Without this key the gateway starts successfully, but Scenario 2 (semantic cache) will not produce cache hits.

Files¶

llm-provider.yaml       LLM provider definition (OpenAI upstream, access control)
llm-proxy.yaml          LLM proxy definition (three policies wired to /chat/completions)
redis-service.yaml      Redis Stack service, merged into docker-compose at setup time
additional-config.toml  Embedding + vector DB config, appended to gateway config.toml
setup.sh                Automated setup (download → configure → start → deploy)
teardown.sh             Automated teardown (delete resources → stop stack)
test-semantic-cache.sh  Verifies semantic cache hits via Redis + Mistral embeddings
test-pii-masking.sh     Verifies email/phone redaction before requests reach OpenAI

Setup¶

./setup.sh

The script performs these steps in order:

Downloads wso2apip-ai-gateway-1.1.0.zip
Extracts the distribution
Appends additional-config.toml into configs/config.toml
Merges the Redis service into docker-compose.yaml
Starts the full Docker Compose stack
Waits for the gateway to become healthy (polls up to 150 s)
Deploys the LLM provider
Deploys the LLM proxy

All steps are idempotent — re-running the script on an already-configured environment is safe.

Endpoints After Setup¶

Endpoint	URL
Gateway proxy (HTTP)	`http://localhost:8080/openai-proxy`
Gateway health	`http://localhost:9094/health`
Management API	`http://localhost:9090/api/management/v0.9`

Running the Tests¶

Each policy has its own script so you can run them independently. All scripts require jq and call the gateway proxy directly — no API key is needed at test time (the gateway uses its stored credentials).

# Scenario 1 — semantic cache
./test-semantic-cache.sh

# Scenario 2 — PII masking
./test-pii-masking.sh

Customising the PII test¶

test-pii-masking.sh prompts for an email and phone number, or you can pass them via environment variables to run non-interactively:

TEST_EMAIL="[email protected]" TEST_PHONE="+15551234567" ./test-pii-masking.sh

Teardown¶

# Stop the stack and delete deployed resources
./teardown.sh

# Also remove the extracted directory and downloaded zip
./teardown.sh --clean

Troubleshooting¶

Symptom	Likely cause
`setup.sh` fails at health check	Docker images are still pulling — wait and retry
Scenario 1: no cache hit detected	`embedding_provider_api_key` is empty in `additional-config.toml`, or Redis is not reachable
Scenario 2: original values appear in response	PII regex did not match — verify the regex patterns in `llm-proxy.yaml`
HTTP 401 on management API	Basic auth header mismatch; default credentials are `admin:admin`