API Reference: Cost Routing Configuration
API Reference: Cost Routing Configuration
Section titled “API Reference: Cost Routing Configuration”Base path: /api/admin/cost-routing
Cost routing configuration controls how Arbitex selects providers and models when multiple options are available for a given request. Policies balance cost, latency, capability, and availability using configurable routing rules.
Routing Policy Object
Section titled “Routing Policy Object”{ "policy_id": "crp_01HXYZ", "name": "Default Cost-Optimized Policy", "description": "Route to cheapest capable model; fall back to quality on failure", "enabled": true, "strategy": "cost_optimized", "group_ids": [], "model_filters": { "min_context_window": 8192, "required_capabilities": [], "excluded_models": ["gpt-4o", "claude-3-5-sonnet"] }, "fallback_chain": [ { "provider": "anthropic", "model_id": "claude-3-haiku-20240307", "priority": 1, "cost_weight": 1.0 }, { "provider": "openai", "model_id": "gpt-4o-mini", "priority": 2, "cost_weight": 1.0 }, { "provider": "anthropic", "model_id": "claude-3-5-sonnet-20241022", "priority": 3, "cost_weight": 2.5 } ], "cost_cap": { "max_cost_per_request_usd": 0.10, "action_on_exceed": "use_cheapest" }, "latency_budget_ms": null, "created_at": "2026-01-15T10:00:00Z", "updated_at": "2026-03-01T09:00:00Z"}Policy Fields
Section titled “Policy Fields”| Field | Type | Description |
|---|---|---|
policy_id | string | Unique policy identifier |
name | string | Human-readable name |
enabled | boolean | Whether policy is active |
strategy | string | Routing strategy (see below) |
group_ids | array | Groups this policy applies to. Empty = org default |
model_filters | object | Constraints on eligible models |
fallback_chain | array | Ordered list of model fallbacks |
cost_cap | object | Per-request cost cap configuration |
latency_budget_ms | integer|null | Max acceptable provider latency (null = no limit) |
Routing Strategies
Section titled “Routing Strategies”| Strategy | Description |
|---|---|
cost_optimized | Always route to the lowest-cost capable model |
quality_optimized | Always route to the highest-quality model (ignores cost) |
latency_optimized | Route to the model with lowest P50 latency (rolling 5m) |
balanced | Weighted score combining cost, quality, and latency |
round_robin | Distribute requests evenly across eligible models |
manual_fallback | Use fallback_chain in strict priority order |
List Routing Policies
Section titled “List Routing Policies”GET /api/admin/cost-routing/policiesAuthorization: Bearer <admin-token>Query parameters:
| Parameter | Type | Description |
|---|---|---|
enabled | boolean | Filter by enabled state |
group_id | string | Policies applying to a group |
strategy | string | Filter by strategy |
Response 200 OK:
{ "policies": [...], "total": 5}Get Routing Policy
Section titled “Get Routing Policy”GET /api/admin/cost-routing/policies/{policy_id}Authorization: Bearer <admin-token>Response 200 OK: Policy object. 404 if not found.
Create Routing Policy
Section titled “Create Routing Policy”POST /api/admin/cost-routing/policiesAuthorization: Bearer <admin-token>Content-Type: application/jsonExample — cost-optimized policy with fallback:
{ "name": "Cost-Optimized Default", "enabled": true, "strategy": "cost_optimized", "group_ids": [], "model_filters": { "min_context_window": 8192, "required_capabilities": [], "excluded_models": [] }, "fallback_chain": [ {"provider": "anthropic", "model_id": "claude-3-haiku-20240307", "priority": 1}, {"provider": "openai", "model_id": "gpt-4o-mini", "priority": 2}, {"provider": "anthropic", "model_id": "claude-3-5-sonnet-20241022", "priority": 3} ], "cost_cap": { "max_cost_per_request_usd": 0.05, "action_on_exceed": "block" }}Example — latency-optimized for real-time group:
{ "name": "Real-Time Latency Policy", "enabled": true, "strategy": "latency_optimized", "group_ids": ["grp_realtime"], "latency_budget_ms": 500, "fallback_chain": [ {"provider": "openai", "model_id": "gpt-4o-mini", "priority": 1}, {"provider": "anthropic", "model_id": "claude-3-haiku-20240307", "priority": 2} ], "cost_cap": null}Response 201 Created: Full policy object.
Policy conflicts: If a new policy’s group_ids overlaps with an existing active policy’s group_ids, the API returns 409 Conflict with the conflicting policy IDs:
{ "error": "policy_conflict", "message": "Group grp_01HXYZ already has an active routing policy.", "conflicting_policy_id": "crp_existing"}Update Routing Policy
Section titled “Update Routing Policy”PUT /api/admin/cost-routing/policies/{policy_id}Authorization: Bearer <admin-token>Content-Type: application/jsonFull replacement. Returns 200 OK with updated policy.
Partial Update
Section titled “Partial Update”PATCH /api/admin/cost-routing/policies/{policy_id}Authorization: Bearer <admin-token>Content-Type: application/json
{ "enabled": false}Patchable fields: enabled, strategy, cost_cap, latency_budget_ms, group_ids, fallback_chain
Delete Routing Policy
Section titled “Delete Routing Policy”DELETE /api/admin/cost-routing/policies/{policy_id}Authorization: Bearer <admin-token>Response 204 No Content.
Groups previously assigned to the deleted policy revert to the org-default policy.
Cost Cap Configuration
Section titled “Cost Cap Configuration”The cost_cap object controls what happens when an estimated request cost exceeds the per-request limit:
{ "max_cost_per_request_usd": 0.10, "action_on_exceed": "use_cheapest"}action_on_exceed values:
| Value | Description |
|---|---|
block | Reject the request with a 402 error |
use_cheapest | Override to cheapest available model |
warn | Allow through, flag in audit log |
Blocked response (402):
{ "error": "cost_cap_exceeded", "message": "Estimated request cost ($0.14) exceeds policy limit ($0.10).", "estimated_cost_usd": 0.14, "limit_usd": 0.10, "policy_id": "crp_01HXYZ"}Model Filters
Section titled “Model Filters”The model_filters object constrains which models are eligible for routing:
{ "min_context_window": 32768, "required_capabilities": ["vision", "function_calling"], "excluded_models": ["gpt-3.5-turbo"], "included_providers": ["anthropic", "openai"], "max_cost_per_1k_input_tokens_usd": 0.01}| Field | Type | Description |
|---|---|---|
min_context_window | integer | Minimum context window in tokens |
required_capabilities | array | Model must support all listed capabilities |
excluded_models | array | Specific model IDs to never route to |
included_providers | array | Only use models from these providers |
max_cost_per_1k_input_tokens_usd | number | Cost ceiling per 1K input tokens |
Available capabilities: vision, function_calling, json_mode, streaming, code_execution.
Routing Simulation
Section titled “Routing Simulation”Test how a policy would route a hypothetical request without sending it to a provider:
POST /api/admin/cost-routing/simulateAuthorization: Bearer <admin-token>Content-Type: application/json
{ "policy_id": "crp_01HXYZ", "request": { "model": null, "estimated_input_tokens": 2500, "estimated_output_tokens": 500, "capabilities_needed": ["function_calling"], "group_id": "grp_01HXYZ" }}Response 200 OK:
{ "selected_model": { "provider": "anthropic", "model_id": "claude-3-haiku-20240307", "reason": "Lowest cost among capable models: $0.0008/request" }, "fallback_sequence": [ {"provider": "anthropic", "model_id": "claude-3-haiku-20240307", "estimated_cost_usd": 0.0008}, {"provider": "openai", "model_id": "gpt-4o-mini", "estimated_cost_usd": 0.0012} ], "cost_cap_check": { "estimated_cost_usd": 0.0008, "limit_usd": 0.10, "would_block": false }}Routing Analytics
Section titled “Routing Analytics”Get cost routing decisions for a time period:
GET /api/admin/cost-routing/analyticsAuthorization: Bearer <admin-token>Query parameters:
| Parameter | Type | Description |
|---|---|---|
period | string | day | week | month |
policy_id | string | Filter to a specific policy |
group_id | string | Filter by group |
Response:
{ "period": "week", "total_requests": 150000, "routing_decisions": { "by_model": [ {"model_id": "claude-3-haiku-20240307", "requests": 95000, "cost_usd": 76.00, "pct": 63.3}, {"model_id": "gpt-4o-mini", "requests": 45000, "cost_usd": 54.00, "pct": 30.0} ], "cost_cap_blocks": 12, "fallback_activations": 340 }, "total_cost_usd": 130.00, "avg_cost_per_request_usd": 0.000867}Error Reference
Section titled “Error Reference”| Status | Code | Description |
|---|---|---|
400 | invalid_request | Malformed JSON |
404 | not_found | Policy not found |
409 | policy_conflict | Group already has an active policy |
422 | validation_error | Invalid model ID, unknown capability, or invalid cost cap |