Alert Configuration
Arbitex alerts notify operators when key platform metrics cross configured thresholds. Alerts are evaluated on a sliding window and deliver notifications via webhook — enabling integration with Slack, PagerDuty, email gateways, or any HTTP receiver.
How alerts work
Section titled “How alerts work”- An alert rule defines a metric, threshold, comparison operator, and time window.
- The alert evaluation service runs periodically and queries the metric over the window.
- If the threshold condition is met, an alert history record is written and the webhook URL is notified.
- A cooldown period prevents repeated firings until the cooldown expires.
Alert rule fields
Section titled “Alert rule fields”| Field | Type | Description |
|---|---|---|
name | string | Human-readable rule name |
metric_type | string | What to measure (see table below) |
threshold | float | Numeric value to compare against |
comparison | string | "gt", "lt", "gte", or "lte" |
window_minutes | int | Sliding evaluation window in minutes |
webhook_url | string | HTTP/HTTPS URL to POST on trigger |
enabled | bool | Whether the rule is active (default true) |
cooldown_minutes | int | Minimum gap between consecutive firings |
Supported metric types
Section titled “Supported metric types”metric_type | Unit | Description |
|---|---|---|
cost | USD | Total AI spend in the evaluation window |
error_rate | Percent | Percentage of 5xx errors in the window |
latency_p95 | Milliseconds | 95th-percentile provider latency |
audit_anomaly | Score | Anomalous audit event density score |
budget_projection | USD | Projected monthly spend at current rate |
cost_anomaly | Score | Statistical cost spike score |
Managing alert rules
Section titled “Managing alert rules”List rules
Section titled “List rules”GET /api/alertsAuthorization: Bearer <admin-token>Returns all configured alert rules for the tenant, ordered by creation date (newest first).
Create a rule
Section titled “Create a rule”POST /api/alertsAuthorization: Bearer <admin-token>Content-Type: application/jsonExample — cost spike alert
{ "name": "Hourly cost spike", "metric_type": "cost", "threshold": 50.00, "comparison": "gt", "window_minutes": 60, "webhook_url": "https://hooks.slack.com/services/T00/B00/XXXX", "enabled": true, "cooldown_minutes": 30}Example — high error rate
{ "name": "Provider error rate > 5%", "metric_type": "error_rate", "threshold": 5.0, "comparison": "gte", "window_minutes": 15, "webhook_url": "https://my-ops-gateway.example.com/webhook", "cooldown_minutes": 10}Response 201 Created — returns the created rule with assigned id.
Update a rule
Section titled “Update a rule”PUT /api/alerts/{rule_id}Authorization: Bearer <admin-token>Partial update — only provided fields are modified.
{ "threshold": 75.00, "cooldown_minutes": 60}Delete a rule
Section titled “Delete a rule”DELETE /api/alerts/{rule_id}Authorization: Bearer <admin-token>Deletes the rule and all associated history records. Returns 204 No Content.
Webhook delivery
Section titled “Webhook delivery”When a rule triggers, Arbitex POSTs a JSON payload to the configured webhook_url:
{ "alert_rule_id": "rule-uuid-...", "alert_name": "Hourly cost spike", "metric_type": "cost", "metric_value": 67.42, "threshold": 50.00, "comparison": "gt", "triggered_at": "2026-03-12T14:35:00Z"}The notified flag on the history record is set to true only if the POST returns a 2xx status. Failed deliveries are not retried automatically — monitor the trigger history to detect missed notifications.
Slack integration
Section titled “Slack integration”Create an Incoming Webhook in your Slack app and set it as the webhook_url. Arbitex’s payload format is compatible with Slack’s generic JSON receiver.
For formatted Slack messages, place a small relay function in front of the webhook URL that reformats the payload into Slack’s Block Kit format.
PagerDuty integration
Section titled “PagerDuty integration”Use a PagerDuty Events API v2 integration key endpoint:
https://events.pagerduty.com/v2/enqueueRoute the alert webhook through a relay that maps the Arbitex payload to PagerDuty’s trigger event format.
Email integration
Section titled “Email integration”Use a service like AWS SES, SendGrid, or Mailgun with an HTTP API endpoint. Most email providers expose a REST endpoint that accepts a JSON payload, which can be used directly or via a relay.
Alert history
Section titled “Alert history”View trigger history
Section titled “View trigger history”GET /api/alerts/history?limit=100Authorization: Bearer <admin-token>Returns recent trigger events with the rule name, metric value, trigger time, and notification delivery status.
Response
{ "items": [ { "id": "history-uuid-...", "alert_rule_id": "rule-uuid-...", "alert_name": "Hourly cost spike", "triggered_at": "2026-03-12T14:35:00Z", "metric_value": 67.42, "notified": true } ], "total": 42}Manual evaluation
Section titled “Manual evaluation”Trigger evaluation immediately
Section titled “Trigger evaluation immediately”POST /api/alerts/evaluateAuthorization: Bearer <admin-token>Runs all enabled alert rules against current metric values outside the normal evaluation schedule. Returns the list of rules that triggered, if any.
Use this to:
- Test a newly created rule against live data
- Validate that a threshold is correctly calibrated
- Diagnose missing notifications after a known event
Operational guidelines
Section titled “Operational guidelines”Choosing thresholds
Section titled “Choosing thresholds”Start conservative (high thresholds, wide windows) and tighten based on observed baselines. The alert history is the primary feedback loop — review it weekly to identify rules that never fire or fire too frequently.
Cooldown tuning
Section titled “Cooldown tuning”Set cooldown_minutes to at least the typical remediation time for the condition. For cost alerts in long-running incidents, a 60-minute cooldown prevents alert fatigue while still providing hourly updates.
Disabling rules during maintenance
Section titled “Disabling rules during maintenance”Set enabled: false via a PATCH before planned maintenance windows rather than deleting the rule. Re-enable afterward to preserve history and configuration.