Skip to content

Token-based rate limiting

AI services often incur costs on a per-token basis, making usage control critical. API Platform’s AI Gateway introduces token-based rate limiting that can be applied at the API level.

Configure Token Based Ratelimit Policy

  1. In the left navigation menu, click Develop, then select Policy.

    Policy page showing API Proxy Contract routes and Service Contract with Azure OpenAI endpoint

  2. Click a Add API Level Policy → Request flow → Attached mediation policy

    Mediation Policy List panel showing Token Based Rate Limiting option

  3. Add the ratelimit information and click save.

    Configure Token Based Rate Limiting Policy dialog with Max Prompt Token Count, Max Completion Token Count, Max Total Token Count, and Time Limit fields