Skip to main content

Monitoring

The platform uses Prometheus (metrics), Loki (logs), and Jaeger (tracing) stack for monitoring. All data is visualized through Grafana.

Monitoring Architecture

Four-component monitoring infrastructure: data collection, visualization, alerting, and source systems.

Metrics

CategoryMetricDescription
APIrequest_countTotal request count
APIresponse_timeResponse time
Agentcredential_issuedIssued credentials
Agentproof_verifiedVerified proofs
Systemcpu_usageCPU usage
Systemmemory_usageMemory usage

Dashboards

DashboardDisplayed Metrics
Platform OverviewTotal active connections, daily credential count, proof verification rate, system health status
API PerformanceRequests/second (RPS), average response time, P95/P99 latency, error rate (%)
Agent ActivityCredential issuance count, proof verification count, active connection count, mediator queue depth
InfrastructureCPU/Memory usage, pod states, disk I/O, network traffic
Dashboard Access

All dashboards are accessible via Grafana at https://grafana.example.com.

Alert Rules

AlertConditionSeverity
HighErrorRateerror_rate > 5%Critical
HighLatencyp99 > 500msWarning
PodCrashLooprestart_count > 3Critical
DiskSpaceLowdisk_usage > 85%Warning
AgentUnhealthyhealth_check failCritical

Log Structure

FieldDescription
timestampISO8601 time
levelLog level
serviceService name
trace_idDistributed trace ID
messageLog message
metadataAdditional info