Viewing Metrics in Grafana¶
Once you've started the gateway with the metrics profile, follow these steps to view component metrics:
Step 1: Access Grafana¶
Open your browser and navigate to: http://localhost:3000
Step 2: Log in to Grafana¶
- Username:
admin - Password:
admin
Note: You'll be prompted to change the password on first login.
Step 3: Navigate to Dashboards¶
- Click on the hamburger menu (☰) in the top-left corner
- Navigate to Dashboards → Browse
- You'll see several pre-built dashboards:
- Infrastructure Overview: High-level view of all components
- Gateway Controller: Detailed gateway-controller metrics
- Policy Engine: Detailed policy-engine metrics
Step 4: View Infrastructure Overview¶
The Infrastructure Overview dashboard provides a comprehensive view:
Gateway Controller Section¶
- API Operations: Total operations and operation rate
- Deployment Latency: End-to-end deployment time
- xDS Clients: Number of connected Envoy routers
- Database Operations: Database operation metrics
- HTTP Requests: REST API request metrics
Policy Engine Section¶
- Request Processing: Total requests and request rate
- Policy Executions: Policy execution metrics
- Active Streams: Current ext_proc streams
- Errors: Error rate and types
System Resources¶
- Memory Usage: Heap, system memory across components
- Goroutines: Go runtime goroutines count
- Uptime: Component availability
Step 5: View Gateway Controller Dashboard¶
The Gateway Controller dashboard provides detailed metrics:
API Management¶
- API Operations Total: Counter for all API operations with labels for:
operation: create, update, delete, getstatus: success, failureapi_type: REST, GraphQL, etc.- APIs Total: Gauge showing deployed APIs by type and status
- Deployment Latency Seconds: Histogram of deployment times
xDS Metrics¶
- xDS Clients Connected: Gauge of connected Envoy instances
- Snapshot Generation Duration: Time to generate configuration snapshots
- XDS Stream Requests: Counter for xDS requests by type
- Snapshot Size: Size of generated configuration snapshots
Database Metrics¶
- Database Operations Total: Counter for database operations
- Database Operation Duration: Histogram of operation times
- Database Size Bytes: Current database size
HTTP API Metrics¶
- HTTP Requests Total: Counter for REST API requests
- HTTP Request Duration: Histogram of API response times
- Concurrent Requests: Current concurrent API requests
Step 6: View Policy Engine Dashboard¶
The Policy Engine dashboard provides detailed metrics:
Request Processing¶
- Requests Total: Counter for all processed requests with labels:
phase: request, responseroute: route nameapi_name: API identifierapi_version: API version- Request Duration Seconds: Histogram of request processing times
- Request Errors Total: Counter for errors by type
Policy Execution¶
- Policy Executions Total: Counter for policy executions with labels:
policy_name: Name of executed policypolicy_version: Policy versionapi: API identifierroute: Route namestatus: success, failure, skip- Policy Duration Seconds: Histogram of policy execution times
- Policies Per Chain: Gauge of current policy chain lengths
Streaming¶
- Active Streams: Current ext_proc streams (gauge)
- XDS Updates Total: Counter for configuration updates
- Body Bytes Processed: Counter for body processing
System Resources¶
- Memory Usage: Memory consumption metrics
- Goroutines: Current goroutines count
- GRPC Connections: Active gRPC connections
Step 7: Create Custom Dashboards¶
You can create custom dashboards in Grafana:
- Click + → Dashboard
- Click + Add visualization
- Select Prometheus as the data source
- Write PromQL queries to fetch metrics
- Configure visualization (graphs, tables, gauges, etc.)
- Save the dashboard
Step 8: Set Up Alerts¶
Create alerts to be notified of issues:
- Navigate to Alerting → Alert rules
- Click + New alert rule
- Define the alert condition using PromQL
- Set severity (Critical, Warning, Info)
- Configure notifications (email, Slack, PagerDuty, etc.)
- Save the alert rule
Example alert for high error rate:
(
rate(gateway_controller_api_operations_total{status="failure"}[5m])
/
rate(gateway_controller_api_operations_total[5m])
) > 0.1
This alert triggers when the error rate exceeds 10% over 5 minutes.