Skip to content

Routing and Model Controls

Arbitex routes AI requests to providers and models based on configurable rules. The routing layer sits between the policy engine and the provider API: requests that clear policy evaluation are dispatched according to routing rules, then delivered to the selected provider. Admins can also set latency alert thresholds and monitor cost distribution across providers.

All routing configuration is available under Admin → Routing.

  • Org Admin role

Navigate to Admin → Routing → Rules to manage routing rules. Rules are evaluated in priority order (lowest number = highest priority). The first rule whose conditions match the request determines the routing action.

Each rule has one or more conditions that must all match (logical AND):

FieldOperatorsExample value
groupeq, in, not_inengineering
user_roleeq, in, not_inadmin
intenteq, in, not_incode_generation
time_of_dayeq, in, not_in09:00-17:00

Operators:

  • eq — exact match (single value)
  • in — matches any of a comma-separated list of values
  • not_in — matches none of a comma-separated list of values

A rule with no conditions matches all requests.

Each rule specifies one or more routing actions applied when conditions match:

ActionDescription
route_to_providerSend the request to a specific AI provider (e.g., anthropic, openai)
route_to_modelSend the request to a specific model ID (e.g., claude-sonnet-4-6)
route_to_tierSend the request to a capability tier: fast (low-latency small models), balanced (general-purpose), or powerful (highest-capability)
optimizeOptimize selection for cost (prefer cheaper models) or latency (prefer fastest response)
cost_weightNumber from 0.0 (optimize purely for quality) to 1.0 (optimize purely for cost). Blends cost and quality scoring when selecting a model.
  1. Navigate to Admin → Routing → Rules.
  2. Click New Rule.
  3. Add one or more conditions using the condition builder:
    • Select a field (group / user_role / intent / time_of_day)
    • Select an operator (eq / in / not_in)
    • Enter a value (for in / not_in, separate multiple values with commas)
  4. Configure the action fields.
  5. Click Save Rule. The rule is added at the lowest priority (highest number) by default.

At least one non-empty condition is required before saving.

Priority is shown as a number in the rule table. Use the / buttons on a rule row to increase or decrease its priority. Changes take effect immediately (each reorder is persisted automatically).

Lower priority numbers are evaluated first. A rule at priority 1 is checked before a rule at priority 10.

Click the Edit icon on a rule row to open the rule editor. Click Delete and confirm to remove the rule.

Terminal window
# List all routing rules (ordered by priority)
GET /api/admin/routing-rules/
# Create a routing rule
POST /api/admin/routing-rules/
{
"conditions": [
{ "field": "group", "operator": "in", "value": "legal,compliance" },
{ "field": "intent", "operator": "eq", "value": "document_review" }
],
"actions": {
"route_to_tier": "powerful",
"cost_weight": 0.2
}
}
# Update a routing rule
PUT /api/admin/routing-rules/{rule_id}
{ ...same structure... }
# Delete a routing rule
DELETE /api/admin/routing-rules/{rule_id}

Navigate to Admin → Routing → Latency to view real-time latency metrics and configure alert thresholds per provider.

The latency monitor shows the following metrics for each provider over a selectable time window (1h / 24h / 7d / 30d):

MetricDescription
p50Median response latency (50th percentile)
p9595th-percentile latency — the primary threshold comparison column
p9999th-percentile latency
avgMean response latency
RequestsTotal request count for the period
Trend↑ (degrading, red) / ↓ (improving, green) / — (stable)

Latency status is color-coded based on the p50 value:

Statusp50 value
Healthy (green)< 200 ms
Warning (amber)200–499 ms
Critical (red)≥ 500 ms

When a provider’s p95 exceeds its configured threshold, the p95 cell shows a red Exceeds threshold badge.

Click Thresholds to expand the threshold configuration panel. Per-provider threshold settings:

SettingDefaultDescription
Latency threshold (ms)500p95 latency above this value triggers the “Exceeds threshold” badge
Error rate (%)5.0Error rate above this percentage triggers a provider-level alert

Adjust the values using the number inputs (latency step: 50 ms; error rate step: 0.5%) and click Save Thresholds.

Terminal window
# Get latency metrics
GET /api/metrics/latency?window=24h
# window options: 1h | 24h | 7d | 30d
# Get current thresholds
GET /api/admin/providers/thresholds
# Response: { "thresholds": { "anthropic": { "latency_ms": 500, "error_rate_pct": 5.0 }, ... } }
# Update thresholds
PUT /api/admin/providers/thresholds
{
"thresholds": {
"anthropic": { "latency_ms": 800, "error_rate_pct": 3.0 },
"openai": { "latency_ms": 600, "error_rate_pct": 5.0 }
}
}

Navigate to Admin → Routing → Cost to see a breakdown of spend by provider and identify cost optimization opportunities.

Use the 24h / 7d / 30d tab selector to change the analysis period. Data is sourced from the usage analytics store.

The provider cost table aggregates all model-level usage into per-provider totals:

ColumnDescription
ProviderAI provider name
Total CostSum of all request costs for this provider in the selected period (USD, 4 decimal places)
Total TokensTotal input + output token count
Cost / 1K tokensEffective cost rate per 1,000 tokens (total_cost ÷ total_tokens × 1,000)

The table is sorted by total cost descending — your highest-spend provider appears first.

Below the summary table, the Cheapest Provider per Model section identifies the most cost-efficient provider for each model that received traffic in the period. Each row shows:

  • Model identifier
  • Cheapest: {provider} badge
  • Cost per token for that provider/model combination

Use this view to inform routing rule configuration. For example, if claude-sonnet-4-6 is cheapest via anthropic, create a routing rule to set route_to_provider: anthropic for workloads where cost is the priority.

When a routing rule uses cost_weight (0.0–1.0), Arbitex blends cost and quality scoring when selecting a model within the matched tier:

  • 0.0 — select the highest-quality model regardless of cost
  • 0.5 — balance cost and quality equally
  • 1.0 — select the lowest-cost model regardless of quality

Pair cost_weight with the Cost Routing view to track whether the blended selection is reducing your total spend over time.


Section titled “Route legal team requests to the powerful tier”
Terminal window
POST /api/admin/routing-rules/
{
"conditions": [
{ "field": "group", "operator": "in", "value": "legal,compliance" }
],
"actions": {
"route_to_tier": "powerful"
}
}
Terminal window
POST /api/admin/routing-rules/
{
"conditions": [
{ "field": "user_role", "operator": "eq", "value": "api_service" }
],
"actions": {
"optimize": "cost",
"cost_weight": 0.8
}
}

Route after-hours requests to a specific provider

Section titled “Route after-hours requests to a specific provider”
Terminal window
POST /api/admin/routing-rules/
{
"conditions": [
{ "field": "time_of_day", "operator": "not_in", "value": "09:00-17:00" }
],
"actions": {
"route_to_provider": "openai"
}
}