Skip to content

API Reference: Cost Routing Configuration

Base path: /api/admin/cost-routing

Cost routing configuration controls how Arbitex selects providers and models when multiple options are available for a given request. Policies balance cost, latency, capability, and availability using configurable routing rules.


{
"policy_id": "crp_01HXYZ",
"name": "Default Cost-Optimized Policy",
"description": "Route to cheapest capable model; fall back to quality on failure",
"enabled": true,
"strategy": "cost_optimized",
"group_ids": [],
"model_filters": {
"min_context_window": 8192,
"required_capabilities": [],
"excluded_models": ["gpt-4o", "claude-3-5-sonnet"]
},
"fallback_chain": [
{
"provider": "anthropic",
"model_id": "claude-3-haiku-20240307",
"priority": 1,
"cost_weight": 1.0
},
{
"provider": "openai",
"model_id": "gpt-4o-mini",
"priority": 2,
"cost_weight": 1.0
},
{
"provider": "anthropic",
"model_id": "claude-3-5-sonnet-20241022",
"priority": 3,
"cost_weight": 2.5
}
],
"cost_cap": {
"max_cost_per_request_usd": 0.10,
"action_on_exceed": "use_cheapest"
},
"latency_budget_ms": null,
"created_at": "2026-01-15T10:00:00Z",
"updated_at": "2026-03-01T09:00:00Z"
}
FieldTypeDescription
policy_idstringUnique policy identifier
namestringHuman-readable name
enabledbooleanWhether policy is active
strategystringRouting strategy (see below)
group_idsarrayGroups this policy applies to. Empty = org default
model_filtersobjectConstraints on eligible models
fallback_chainarrayOrdered list of model fallbacks
cost_capobjectPer-request cost cap configuration
latency_budget_msinteger|nullMax acceptable provider latency (null = no limit)
StrategyDescription
cost_optimizedAlways route to the lowest-cost capable model
quality_optimizedAlways route to the highest-quality model (ignores cost)
latency_optimizedRoute to the model with lowest P50 latency (rolling 5m)
balancedWeighted score combining cost, quality, and latency
round_robinDistribute requests evenly across eligible models
manual_fallbackUse fallback_chain in strict priority order

GET /api/admin/cost-routing/policies
Authorization: Bearer <admin-token>

Query parameters:

ParameterTypeDescription
enabledbooleanFilter by enabled state
group_idstringPolicies applying to a group
strategystringFilter by strategy

Response 200 OK:

{
"policies": [...],
"total": 5
}

GET /api/admin/cost-routing/policies/{policy_id}
Authorization: Bearer <admin-token>

Response 200 OK: Policy object. 404 if not found.


POST /api/admin/cost-routing/policies
Authorization: Bearer <admin-token>
Content-Type: application/json

Example — cost-optimized policy with fallback:

{
"name": "Cost-Optimized Default",
"enabled": true,
"strategy": "cost_optimized",
"group_ids": [],
"model_filters": {
"min_context_window": 8192,
"required_capabilities": [],
"excluded_models": []
},
"fallback_chain": [
{"provider": "anthropic", "model_id": "claude-3-haiku-20240307", "priority": 1},
{"provider": "openai", "model_id": "gpt-4o-mini", "priority": 2},
{"provider": "anthropic", "model_id": "claude-3-5-sonnet-20241022", "priority": 3}
],
"cost_cap": {
"max_cost_per_request_usd": 0.05,
"action_on_exceed": "block"
}
}

Example — latency-optimized for real-time group:

{
"name": "Real-Time Latency Policy",
"enabled": true,
"strategy": "latency_optimized",
"group_ids": ["grp_realtime"],
"latency_budget_ms": 500,
"fallback_chain": [
{"provider": "openai", "model_id": "gpt-4o-mini", "priority": 1},
{"provider": "anthropic", "model_id": "claude-3-haiku-20240307", "priority": 2}
],
"cost_cap": null
}

Response 201 Created: Full policy object.

Policy conflicts: If a new policy’s group_ids overlaps with an existing active policy’s group_ids, the API returns 409 Conflict with the conflicting policy IDs:

{
"error": "policy_conflict",
"message": "Group grp_01HXYZ already has an active routing policy.",
"conflicting_policy_id": "crp_existing"
}

PUT /api/admin/cost-routing/policies/{policy_id}
Authorization: Bearer <admin-token>
Content-Type: application/json

Full replacement. Returns 200 OK with updated policy.


PATCH /api/admin/cost-routing/policies/{policy_id}
Authorization: Bearer <admin-token>
Content-Type: application/json
{
"enabled": false
}

Patchable fields: enabled, strategy, cost_cap, latency_budget_ms, group_ids, fallback_chain


DELETE /api/admin/cost-routing/policies/{policy_id}
Authorization: Bearer <admin-token>

Response 204 No Content.

Groups previously assigned to the deleted policy revert to the org-default policy.


The cost_cap object controls what happens when an estimated request cost exceeds the per-request limit:

{
"max_cost_per_request_usd": 0.10,
"action_on_exceed": "use_cheapest"
}

action_on_exceed values:

ValueDescription
blockReject the request with a 402 error
use_cheapestOverride to cheapest available model
warnAllow through, flag in audit log

Blocked response (402):

{
"error": "cost_cap_exceeded",
"message": "Estimated request cost ($0.14) exceeds policy limit ($0.10).",
"estimated_cost_usd": 0.14,
"limit_usd": 0.10,
"policy_id": "crp_01HXYZ"
}

The model_filters object constrains which models are eligible for routing:

{
"min_context_window": 32768,
"required_capabilities": ["vision", "function_calling"],
"excluded_models": ["gpt-3.5-turbo"],
"included_providers": ["anthropic", "openai"],
"max_cost_per_1k_input_tokens_usd": 0.01
}
FieldTypeDescription
min_context_windowintegerMinimum context window in tokens
required_capabilitiesarrayModel must support all listed capabilities
excluded_modelsarraySpecific model IDs to never route to
included_providersarrayOnly use models from these providers
max_cost_per_1k_input_tokens_usdnumberCost ceiling per 1K input tokens

Available capabilities: vision, function_calling, json_mode, streaming, code_execution.


Test how a policy would route a hypothetical request without sending it to a provider:

POST /api/admin/cost-routing/simulate
Authorization: Bearer <admin-token>
Content-Type: application/json
{
"policy_id": "crp_01HXYZ",
"request": {
"model": null,
"estimated_input_tokens": 2500,
"estimated_output_tokens": 500,
"capabilities_needed": ["function_calling"],
"group_id": "grp_01HXYZ"
}
}

Response 200 OK:

{
"selected_model": {
"provider": "anthropic",
"model_id": "claude-3-haiku-20240307",
"reason": "Lowest cost among capable models: $0.0008/request"
},
"fallback_sequence": [
{"provider": "anthropic", "model_id": "claude-3-haiku-20240307", "estimated_cost_usd": 0.0008},
{"provider": "openai", "model_id": "gpt-4o-mini", "estimated_cost_usd": 0.0012}
],
"cost_cap_check": {
"estimated_cost_usd": 0.0008,
"limit_usd": 0.10,
"would_block": false
}
}

Get cost routing decisions for a time period:

GET /api/admin/cost-routing/analytics
Authorization: Bearer <admin-token>

Query parameters:

ParameterTypeDescription
periodstringday | week | month
policy_idstringFilter to a specific policy
group_idstringFilter by group

Response:

{
"period": "week",
"total_requests": 150000,
"routing_decisions": {
"by_model": [
{"model_id": "claude-3-haiku-20240307", "requests": 95000, "cost_usd": 76.00, "pct": 63.3},
{"model_id": "gpt-4o-mini", "requests": 45000, "cost_usd": 54.00, "pct": 30.0}
],
"cost_cap_blocks": 12,
"fallback_activations": 340
},
"total_cost_usd": 130.00,
"avg_cost_per_request_usd": 0.000867
}

StatusCodeDescription
400invalid_requestMalformed JSON
404not_foundPolicy not found
409policy_conflictGroup already has an active policy
422validation_errorInvalid model ID, unknown capability, or invalid cost cap