LLM Provider and Proxy - AI Policies for privacy and cost control¶
Sample source: wso2/api-platform-samples/llm-cost-control-and-privacy-control
Overview¶
This guide is for platform engineers and API developers who want to add resilience, cost control, and data-safety policies to AI-powered applications — without changing application code.
When your application routes LLM traffic through the WSO2 AI Gateway, you can attach policies that handle failover, caching, and PII protection at the gateway layer. This sample sets up a local Docker stack that demonstrates all three policies working together against an OpenAI-compatible endpoint, giving you a reproducible environment to understand and test each behavior before deploying to production.
What You Will Learn¶
By working through this sample you will understand how to:
- Enable semantic caching — serve repeated or semantically similar questions from a Redis cache, reducing latency and API cost
- Apply PII masking — strip sensitive identifiers (emails, phone numbers) from request payloads before they reach the upstream LLM provider
Scenarios Covered¶
Scenario 1 — Semantic Cache¶
Problem: Identical or near-identical questions (paraphrased, reworded) are sent to the LLM repeatedly, incurring cost and latency on every call.
What this scenario does: The gateway generates an embedding of each incoming prompt using Mistral and stores the LLM response in Redis. If a subsequent request is ≥ 85% semantically similar to a cached prompt, the cached response is returned immediately — no LLM call is made.
Scenario 2 — PII Masking¶
Problem: Users sometimes include personal data (email addresses, phone numbers) in prompts. Sending this data to a third-party LLM provider may violate privacy policies or compliance requirements.
What this scenario does: Before the request leaves the gateway, a regex-based redaction policy replaces detected email addresses and phone numbers with masked placeholders. The upstream model never sees the original values.
Expected Results¶
After running the test scripts you should observe the following for each scenario.
Scenario 1 — Semantic Cache¶
The same question is sent twice. On the second request, the gateway should return a cached response. The test detects a cache hit via:
- A
HITvalue in anyX-Cache*response header, or - The second response arriving ≥ 3× faster than the first (LLM baseline is typically > 500 ms)
If
embedding_provider_api_keyis not set inadditional-config.toml, cache lookups silently fall through to OpenAI and the test will warn rather than fail.
Scenario 2 — PII Masking¶
A prompt containing a unique email address and phone number is sent, asking the model to repeat them verbatim. Because the gateway redacts the values before forwarding the request, neither the original email nor the phone number should appear in the response.
Prerequisites¶
| Tool | Purpose |
|---|---|
| Docker + Docker Compose | Runs the gateway stack |
wget |
Downloads the gateway distribution |
unzip |
Extracts the distribution |
python3 + pyyaml |
Used by setup scripts to merge YAML/TOML files |
curl |
Calls the gateway management API and proxy endpoint |
jq |
Used by test.sh to parse API responses (brew install jq) |
Required Configuration¶
1. OpenAI API Key¶
The setup script injects the key into the LLM provider at deploy time. Provide it via:
# Option A — environment variable (recommended)
export OPENAI_API_KEY="sk-..."
# Option B — script argument
./setup.sh sk-...
# Option C — interactive prompt (key is hidden)
./setup.sh
The key is never written to disk; it is substituted into the provider payload at runtime and discarded.
2. Mistral API Key (required for Scenario 2)¶
Open additional-config.toml and fill in your Mistral key:
Without this key the gateway starts successfully, but Scenario 2 (semantic cache) will not produce cache hits.
Files¶
llm-provider.yaml LLM provider definition (OpenAI upstream, access control)
llm-proxy.yaml LLM proxy definition (three policies wired to /chat/completions)
redis-service.yaml Redis Stack service, merged into docker-compose at setup time
additional-config.toml Embedding + vector DB config, appended to gateway config.toml
setup.sh Automated setup (download → configure → start → deploy)
teardown.sh Automated teardown (delete resources → stop stack)
test-semantic-cache.sh Verifies semantic cache hits via Redis + Mistral embeddings
test-pii-masking.sh Verifies email/phone redaction before requests reach OpenAI
Setup¶
The script performs these steps in order:
- Downloads
wso2apip-ai-gateway-1.1.0.zip - Extracts the distribution
- Appends
additional-config.tomlintoconfigs/config.toml - Merges the Redis service into
docker-compose.yaml - Starts the full Docker Compose stack
- Waits for the gateway to become healthy (polls up to 150 s)
- Deploys the LLM provider
- Deploys the LLM proxy
All steps are idempotent — re-running the script on an already-configured environment is safe.
Endpoints After Setup¶
| Endpoint | URL |
|---|---|
| Gateway proxy (HTTP) | http://localhost:8080/openai-proxy |
| Gateway health | http://localhost:9094/health |
| Management API | http://localhost:9090/api/management/v0.9 |
Running the Tests¶
Each policy has its own script so you can run them independently. All scripts require jq and call the gateway proxy directly — no API key is needed at test time (the gateway uses its stored credentials).
# Scenario 1 — semantic cache
./test-semantic-cache.sh
# Scenario 2 — PII masking
./test-pii-masking.sh
Customising the PII test¶
test-pii-masking.sh prompts for an email and phone number, or you can pass them via environment variables to run non-interactively:
TEST_EMAIL="[email protected]" TEST_PHONE="+15551234567" ./test-pii-masking.sh
Teardown¶
# Stop the stack and delete deployed resources
./teardown.sh
# Also remove the extracted directory and downloaded zip
./teardown.sh --clean
Troubleshooting¶
| Symptom | Likely cause |
|---|---|
setup.sh fails at health check |
Docker images are still pulling — wait and retry |
| Scenario 1: no cache hit detected | embedding_provider_api_key is empty in additional-config.toml, or Redis is not reachable |
| Scenario 2: original values appear in response | PII regex did not match — verify the regex patterns in llm-proxy.yaml |
| HTTP 401 on management API | Basic auth header mismatch; default credentials are admin:admin |