Monitoring¶
HX-SDP exposes Prometheus metrics and structured logs for integration with standard observability stacks.
Prometheus metrics¶
Both HX-Gate and HX-Engine expose a /metrics endpoint (enabled by default):
# Gate metrics
curl http://localhost:8080/metrics
# Engine metrics
curl http://localhost:8000/metrics
Key metrics¶
| Metric | Type | Labels | Description |
|---|---|---|---|
hx_gate_requests_total |
counter | method, path, status |
Total requests handled by gate |
hx_gate_request_duration_seconds |
histogram | method, path |
Request latency |
hx_gate_cus_total |
counter | tenant_id, operation |
CUs consumed per tenant |
hx_gate_rate_limit_rejections_total |
counter | tenant_id |
Requests rejected by rate limiter |
hx_engine_compression_ratio |
histogram | domain, verdict |
Compression ratio distribution |
hx_engine_put_duration_seconds |
histogram | domain |
PUT operation latency |
hx_engine_query_duration_seconds |
histogram | metric |
Query operation latency |
hx_engine_active_keys |
gauge | namespace |
Number of active keys per namespace |
hx_engine_storage_bytes |
gauge | namespace |
Total TT-core bytes per namespace |
Prometheus scrape config¶
scrape_configs:
- job_name: hx-gate
static_configs:
- targets: ["gate:8080"]
metrics_path: /metrics
scrape_interval: 15s
- job_name: hx-engine
static_configs:
- targets: ["engine:8000"]
metrics_path: /metrics
scrape_interval: 15s
Audit log¶
The gate writes a JSONL audit log of every operation:
Location: HX_GATE_AUDIT_LOG_PATH (default: /var/log/hx-gate/audit.jsonl)
Format (one JSON object per line):
{
"ts": "2026-01-15T10:30:42.123Z",
"tenant_id": "acme-corp",
"method": "POST",
"path": "/v1/put",
"namespace": "production",
"status": 200,
"cus": 1.0,
"latency_ms": 42.3,
"request_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}
Tail the audit log¶
# Live stream
docker exec hx-gate tail -f /var/log/hx-gate/audit.jsonl | jq .
# Filter by tenant
docker exec hx-gate cat /var/log/hx-gate/audit.jsonl | \
jq 'select(.tenant_id == "acme-corp")'
# Errors only
docker exec hx-gate cat /var/log/hx-gate/audit.jsonl | \
jq 'select(.status >= 400)'
Log rotation¶
The audit log grows indefinitely. Configure log rotation:
# /etc/logrotate.d/hx-gate
/var/log/hx-gate/audit.jsonl {
daily
rotate 30
compress
missingok
notifempty
copytruncate
}
Structured application logs¶
Both services emit structured JSON logs to stdout:
Set log level via environment:
Grafana dashboard¶
Import these panels for a complete HX-SDP dashboard:
Request volume¶
P99 latency¶
CU consumption rate (per tenant)¶
Error rate¶
Storage growth¶
Compression efficiency¶
Alerts¶
Recommended alert rules:
| Alert | Condition | Severity |
|---|---|---|
| High error rate | 5xx rate > 1% for 5 min | Critical |
| Engine unreachable | up{job="hx-engine"} == 0 for 2 min |
Critical |
| CU quota approaching | tenant CU usage > 80% of quota | Warning |
| Rate limit storms | rate limit rejections > 100/min | Warning |
| Storage growing fast | storage_bytes increase > 1 GB/hr | Info |
| High P99 latency | P99 > 5s for 5 min | Warning |
Docker health checks¶
The Compose file includes built-in health checks:
gate:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 5s
retries: 3
engine:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 5s
retries: 3
Monitor container status: