Skip to content

Alert Configuration

Arbitex alerts notify operators when key platform metrics cross configured thresholds. Alerts are evaluated on a sliding window and deliver notifications via webhook — enabling integration with Slack, PagerDuty, email gateways, or any HTTP receiver.

  1. An alert rule defines a metric, threshold, comparison operator, and time window.
  2. The alert evaluation service runs periodically and queries the metric over the window.
  3. If the threshold condition is met, an alert history record is written and the webhook URL is notified.
  4. A cooldown period prevents repeated firings until the cooldown expires.

FieldTypeDescription
namestringHuman-readable rule name
metric_typestringWhat to measure (see table below)
thresholdfloatNumeric value to compare against
comparisonstring"gt", "lt", "gte", or "lte"
window_minutesintSliding evaluation window in minutes
webhook_urlstringHTTP/HTTPS URL to POST on trigger
enabledboolWhether the rule is active (default true)
cooldown_minutesintMinimum gap between consecutive firings
metric_typeUnitDescription
costUSDTotal AI spend in the evaluation window
error_ratePercentPercentage of 5xx errors in the window
latency_p95Milliseconds95th-percentile provider latency
audit_anomalyScoreAnomalous audit event density score
budget_projectionUSDProjected monthly spend at current rate
cost_anomalyScoreStatistical cost spike score

GET /api/alerts
Authorization: Bearer <admin-token>

Returns all configured alert rules for the tenant, ordered by creation date (newest first).


POST /api/alerts
Authorization: Bearer <admin-token>
Content-Type: application/json

Example — cost spike alert

{
"name": "Hourly cost spike",
"metric_type": "cost",
"threshold": 50.00,
"comparison": "gt",
"window_minutes": 60,
"webhook_url": "https://hooks.slack.com/services/T00/B00/XXXX",
"enabled": true,
"cooldown_minutes": 30
}

Example — high error rate

{
"name": "Provider error rate > 5%",
"metric_type": "error_rate",
"threshold": 5.0,
"comparison": "gte",
"window_minutes": 15,
"webhook_url": "https://my-ops-gateway.example.com/webhook",
"cooldown_minutes": 10
}

Response 201 Created — returns the created rule with assigned id.


PUT /api/alerts/{rule_id}
Authorization: Bearer <admin-token>

Partial update — only provided fields are modified.

{
"threshold": 75.00,
"cooldown_minutes": 60
}

DELETE /api/alerts/{rule_id}
Authorization: Bearer <admin-token>

Deletes the rule and all associated history records. Returns 204 No Content.


When a rule triggers, Arbitex POSTs a JSON payload to the configured webhook_url:

{
"alert_rule_id": "rule-uuid-...",
"alert_name": "Hourly cost spike",
"metric_type": "cost",
"metric_value": 67.42,
"threshold": 50.00,
"comparison": "gt",
"triggered_at": "2026-03-12T14:35:00Z"
}

The notified flag on the history record is set to true only if the POST returns a 2xx status. Failed deliveries are not retried automatically — monitor the trigger history to detect missed notifications.

Create an Incoming Webhook in your Slack app and set it as the webhook_url. Arbitex’s payload format is compatible with Slack’s generic JSON receiver.

For formatted Slack messages, place a small relay function in front of the webhook URL that reformats the payload into Slack’s Block Kit format.

Use a PagerDuty Events API v2 integration key endpoint:

https://events.pagerduty.com/v2/enqueue

Route the alert webhook through a relay that maps the Arbitex payload to PagerDuty’s trigger event format.

Use a service like AWS SES, SendGrid, or Mailgun with an HTTP API endpoint. Most email providers expose a REST endpoint that accepts a JSON payload, which can be used directly or via a relay.


GET /api/alerts/history?limit=100
Authorization: Bearer <admin-token>

Returns recent trigger events with the rule name, metric value, trigger time, and notification delivery status.

Response

{
"items": [
{
"id": "history-uuid-...",
"alert_rule_id": "rule-uuid-...",
"alert_name": "Hourly cost spike",
"triggered_at": "2026-03-12T14:35:00Z",
"metric_value": 67.42,
"notified": true
}
],
"total": 42
}

POST /api/alerts/evaluate
Authorization: Bearer <admin-token>

Runs all enabled alert rules against current metric values outside the normal evaluation schedule. Returns the list of rules that triggered, if any.

Use this to:

  • Test a newly created rule against live data
  • Validate that a threshold is correctly calibrated
  • Diagnose missing notifications after a known event

Start conservative (high thresholds, wide windows) and tighten based on observed baselines. The alert history is the primary feedback loop — review it weekly to identify rules that never fire or fire too frequently.

Set cooldown_minutes to at least the typical remediation time for the condition. For cost alerts in long-running incidents, a 60-minute cooldown prevents alert fatigue while still providing hourly updates.

Set enabled: false via a PATCH before planned maintenance windows rather than deleting the rule. Re-enable afterward to preserve history and configuration.