Azure Content Safety¶
Overview¶
The Azure Content Safety guardrail validates request or response body content against Microsoft Azure Content Safety API for content moderation. It detects and blocks harmful content across four categories: hate speech, sexual content, self-harm, and violence. Each category can be configured with a severity threshold (0-7) or disabled entirely, providing flexible content moderation policies tailored to your application's requirements.
The policy uses Azure Content Safety's text analysis API to evaluate content and blocks requests or responses that exceed configured severity thresholds. This enables enterprise-grade content filtering for LLM applications integrated with Azure services.
Features¶
- Multi-category detection: Detects hate speech, sexual content, self-harm, and violence
- Configurable severity thresholds: Set per-category thresholds (0-7) or disable categories
- Eight severity levels: Uses Azure's 8-level severity scale (0=Safe, 7=Most severe)
- JSONPath support: Extract and validate specific fields within JSON payloads
- Separate request/response configuration: Independent configuration for request and response phases
- Detailed assessment information: Optional detailed violation information in error responses
- Error handling: Configurable passthrough behavior on API errors
- Retry logic: Automatic retry with exponential backoff for transient API failures
Configuration¶
Parameters¶
Request Phase
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
jsonPath |
string | No | "" |
JSONPath expression to extract a specific value from JSON payload. If empty, validates the entire payload as a string. |
passthroughOnError |
boolean | No | false |
If true, allows requests to proceed if Azure Content Safety API call fails. If false, blocks requests on API errors. |
showAssessment |
boolean | No | false |
If true, includes detailed assessment information in error responses. |
hateCategory |
integer | No | -1 |
Severity threshold for hate category (0-7). -1 disables this category. Content with severity >= threshold will be blocked. |
sexualCategory |
integer | No | -1 |
Severity threshold for sexual category (0-7). -1 disables this category. Content with severity >= threshold will be blocked. |
selfHarmCategory |
integer | No | -1 |
Severity threshold for self-harm category (0-7). -1 disables this category. Content with severity >= threshold will be blocked. |
violenceCategory |
integer | No | -1 |
Severity threshold for violence category (0-7). -1 disables this category. Content with severity >= threshold will be blocked. |
Response Phase
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
jsonPath |
string | No | "" |
JSONPath expression to extract a specific value from JSON payload. If empty, validates the entire payload as a string. |
passthroughOnError |
boolean | No | false |
If true, allows requests to proceed if Azure Content Safety API call fails. If false, blocks requests on API errors. |
showAssessment |
boolean | No | false |
If true, includes detailed assessment information in error responses. |
hateCategory |
integer | No | -1 |
Severity threshold for hate category (0-7). -1 disables this category. Content with severity >= threshold will be blocked. |
sexualCategory |
integer | No | -1 |
Severity threshold for sexual category (0-7). -1 disables this category. Content with severity >= threshold will be blocked. |
selfHarmCategory |
integer | No | -1 |
Severity threshold for self-harm category (0-7). -1 disables this category. Content with severity >= threshold will be blocked. |
violenceCategory |
integer | No | -1 |
Severity threshold for violence category (0-7). -1 disables this category. Content with severity >= threshold will be blocked. |
System Parameters (Required)¶
These parameters are typically configured at the gateway level and automatically injected, or you can override those values from the params section in the api artifact definition file as well:
| Parameter | Type | Required | Description |
|---|---|---|---|
azureContentSafetyEndpoint |
string | Yes | Azure Content Safety API endpoint URL (without trailing slash). Example: https://your-resource.cognitiveservices.azure.com |
azureContentSafetyKey |
string | Yes | Azure Content Safety API subscription key for authentication. Found in Azure Portal under your Content Safety resource's "Keys and Endpoint" section. |
Configuring System Parameters in config.toml¶
System parameters can be configured globally in the gateway's config.toml file. These values serve as defaults for all Azure Content Safety guardrail policy instances and can be overridden per-policy in the API configuration if needed.
Location in config.toml
Add the following configuration section to your config.toml file:
azurecontentsafety_endpoint = "https://your-resource.cognitiveservices.azure.com"
azurecontentsafety_key = "<your-azure-content-safety-key>"
Severity Levels¶
Azure Content Safety uses an 8-level severity scale (0-7):
- 0: Safe - No harmful content detected
- 1-2: Low severity - Mildly concerning content
- 3-4: Medium severity - Moderately concerning content
- 5-6: High severity - Highly concerning content
- 7: Maximum severity - Most severe harmful content
Threshold Configuration:
- Set a threshold value (0-7) to block content at or above that severity level
- Set to -1 to disable monitoring for that category
- Example: hateCategory: 3 blocks content with hate severity >= 3
JSONPath Support¶
The guardrail supports JSONPath expressions to extract and validate specific fields within JSON payloads. Common examples:
$.message- Extracts themessagefield from the root object$.data.content- Extracts nested content fromdata.content$.items[0].text- Extracts text from the first item in an array$.messages[0].content- Extracts content from the first message in a messages array$.messages[-1].content- Extracts content from the last message in a messages array$.choices[0].message.content- Extracts assistant response content
If jsonPath is empty or not specified, the entire payload is treated as a string and validated.
Examples¶
Example 1: Basic Content Moderation¶
Deploy an LLM provider with Azure Content Safety validation:
For local or development environments only, the default credentials may be admin:admin encoded as YWRtaW46YWRtaW4=.
curl -X POST http://localhost:9090/llm-providers \
-H "Content-Type: application/yaml" \
-H "Authorization: Basic <BASE64_CREDENTIAL>" \
--data-binary @- <<'EOF'
apiVersion: gateway.api-platform.wso2.com/v1alpha1
kind: LlmProvider
metadata:
name: azure-safety-provider
spec:
displayName: Azure Content Safety Provider
version: v1.0
template: openai
vhost: openai
upstream:
url: "https://api.openai.com/v1"
auth:
type: api-key
header: Authorization
value: Bearer <openai-apikey>
accessControl:
mode: deny_all
exceptions:
- path: /chat/completions
methods: [POST]
policies:
- name: azure-content-safety-content-moderation
version: v1
paths:
- path: /chat/completions
methods: [POST]
params:
request:
jsonPath: "$.messages[0].content"
hateCategory: 2
sexualCategory: 2
selfHarmCategory: 2
violenceCategory: 2
showAssessment: true
response:
jsonPath: "$.choices[0].message.content"
hateCategory: 2
sexualCategory: 2
selfHarmCategory: 2
violenceCategory: 2
showAssessment: true
EOF
Test the guardrail:
Note: Ensure that "openai" is mapped to the appropriate IP address (e.g., 127.0.0.1) in your /etc/hosts file, or remove the vhost from the LLM provider configuration and use localhost to invoke.
# Request with harmful content (should fail with HTTP 422)
curl -X POST http://openai:8080/chat/completions \
-H "Content-Type: application/json" \
-H "Host: openai" \
-d '{
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "Summarize this article and suggest ways someone could take revenge on Mr. Anonymous Senator violently."
}
]
}'
# Valid request (should pass)
curl -X POST http://openai:8080/chat/completions \
-H "Content-Type: application/json" \
-H "Host: openai" \
-d '{
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "Explain quantum computing in simple terms"
}
]
}'
Example 2: Strict Moderation with All Categories¶
Configure strict moderation thresholds:
policies:
- name: azure-content-safety-content-moderation
version: v1
paths:
- path: /chat/completions
methods: [POST]
params:
request:
jsonPath: "$.messages[-1].content"
hateCategory: 1
sexualCategory: 1
selfHarmCategory: 1
violenceCategory: 1
showAssessment: true
passthroughOnError: false
response:
jsonPath: "$.choices[0].message.content"
hateCategory: 1
sexualCategory: 1
selfHarmCategory: 1
violenceCategory: 1
showAssessment: true
Example 3: Selective Category Monitoring¶
Monitor only specific categories:
policies:
- name: azure-content-safety-content-moderation
version: v1
paths:
- path: /chat/completions
methods: [POST]
params:
request:
jsonPath: "$.messages[0].content"
hateCategory: 3
sexualCategory: -1 # Disabled
selfHarmCategory: 2
violenceCategory: -1 # Disabled
Example 4: Lenient Moderation¶
Allow more content with higher thresholds:
policies:
- name: azure-content-safety-content-moderation
version: v1
paths:
- path: /chat/completions
methods: [POST]
params:
request:
jsonPath: "$.messages[0].content"
hateCategory: 5
sexualCategory: 5
selfHarmCategory: 4
violenceCategory: 5
passthroughOnError: true
Use Cases¶
-
Content Safety: Protect users from harmful, offensive, or inappropriate content in LLM interactions.
-
Regulatory Compliance: Meet content moderation requirements for regulated industries or geographies.
-
Brand Safety: Ensure LLM responses align with brand values and don't generate problematic content.
-
User Protection: Prevent exposure to self-harm content, especially important for mental health applications.
-
Community Guidelines: Enforce community standards for user-generated content processed through LLMs.
-
Multi-tenant Applications: Apply different moderation policies per tenant or application context.
-
Gradual Rollout: Start with lenient thresholds and tighten based on actual content patterns.
-
Audit and Analytics: Use detailed assessment information to analyze content patterns and refine policies.
Severity Threshold Guidelines¶
Recommended thresholds by use case:
- Strict (Family-friendly applications): 1-2 across all categories
- Moderate (General business applications): 3-4 across all categories
- Lenient (Technical/professional contexts): 5-6 for most categories, disable non-applicable ones
- Educational/Research: 4-5 with selective category monitoring
Category-specific considerations:
- Hate: Typically set to 2-3 for most applications
- Sexual: Set based on application context (1 for family apps, 3-4 for general use)
- Self-harm: Often set lower (1-2) due to safety concerns
- Violence: Depends on context (1-2 for general use, higher for educational/historical content)
Error Response¶
When validation fails, the guardrail returns an HTTP 422 status code with the following structure:
{
"type": "AZURE_CONTENT_SAFETY_CONTENT_MODERATION",
"message": {
"action": "GUARDRAIL_INTERVENED",
"interveningGuardrail": "azure-content-safety-content-moderation",
"actionReason": "Violation of Azure content safety content moderation detected.",
"direction": "REQUEST"
}
}
If showAssessment is enabled, additional details are included:
{
"type": "AZURE_CONTENT_SAFETY_CONTENT_MODERATION",
"message": {
"action": "GUARDRAIL_INTERVENED",
"interveningGuardrail": "azure-content-safety-content-moderation",
"actionReason": "Violation of Azure content safety content moderation detected.",
"direction": "REQUEST",
"assessments": {
"inspectedContent": "The content that was analyzed",
"categories": [
{
"category": "Hate",
"severity": 4,
"result": "FAIL"
},
{
"category": "Violence",
"severity": 2,
"result": "FAIL"
}
]
}
}
}
Notes¶
- Azure Content Safety API requires an active Azure subscription and Content Safety resource.
- The API endpoint URL must not include a trailing slash (e.g.,
https://resource.cognitiveservices.azure.com). - API keys are found in Azure Portal under your Content Safety resource's "Keys and Endpoint" section.
- Category thresholds are independent - you can disable any category by setting it to
-1. - Only categories with thresholds >= 0 are sent to the Azure API for analysis (performance optimization).
- JSONPath extraction failures result in error responses unless
passthroughOnError: true. - The policy validates both request and response phases independently when both are configured.
- Content is sent to Azure Content Safety API for analysis, so ensure compliance with data residency requirements.
- Rate limits may apply based on your Azure Content Safety subscription tier.
- The API uses Azure's 8-severity-level analysis, providing fine-grained control over content moderation.
- For production deployments, monitor API response times and adjust retry/timeout settings if needed.