Monitoring
The platform uses Prometheus (metrics), Loki (logs), and Jaeger (tracing) stack for monitoring. All data is visualized through Grafana.
Monitoring Architecture
Four-component monitoring infrastructure: data collection, visualization, alerting, and source systems.
Metrics
| Category | Metric | Description |
|---|---|---|
| API | request_count | Total request count |
| API | response_time | Response time |
| Agent | credential_issued | Issued credentials |
| Agent | proof_verified | Verified proofs |
| System | cpu_usage | CPU usage |
| System | memory_usage | Memory usage |
Dashboards
| Dashboard | Displayed Metrics |
|---|---|
| Platform Overview | Total active connections, daily credential count, proof verification rate, system health status |
| API Performance | Requests/second (RPS), average response time, P95/P99 latency, error rate (%) |
| Agent Activity | Credential issuance count, proof verification count, active connection count, mediator queue depth |
| Infrastructure | CPU/Memory usage, pod states, disk I/O, network traffic |
Dashboard Access
All dashboards are accessible via Grafana at https://grafana.example.com.
Alert Rules
| Alert | Condition | Severity |
|---|---|---|
| HighErrorRate | error_rate > 5% | Critical |
| HighLatency | p99 > 500ms | Warning |
| PodCrashLoop | restart_count > 3 | Critical |
| DiskSpaceLow | disk_usage > 85% | Warning |
| AgentUnhealthy | health_check fail | Critical |
Log Structure
| Field | Description |
|---|---|
timestamp | ISO8601 time |
level | Log level |
service | Service name |
trace_id | Distributed trace ID |
message | Log message |
metadata | Additional info |