Billing and metering

Arbitex Gateway tracks token usage and cost for every request. You can set per-user and per-group budget caps across three dimensions — tokens, request count, and dollar cost — each with daily and monthly variants. When a budget is exhausted, the gateway blocks the request before it reaches the model provider.

Budget enforcement runs in the payload analysis stage of the intake pipeline, alongside DLP inspection. A request blocked by a budget cap does not consume provider tokens and does not produce a DLP finding or policy evaluation — but it does produce an audit log entry.

Budget dimensions

Each quota configuration supports six independent limits. Set any combination. A null value means unlimited for that dimension.

Dimension	Field	Scope	Description
Daily tokens	`daily_token_limit`	Per calendar day (UTC)	Maximum `input_tokens + output_tokens` per day
Monthly tokens	`monthly_token_limit`	Per calendar month (UTC)	Maximum tokens per month
Daily requests	`daily_request_limit`	Per calendar day (UTC)	Maximum request count per day
Monthly requests	`monthly_request_limit`	Per calendar month (UTC)	Maximum request count per month
Daily cost	`daily_cost_limit_usd`	Per calendar day (UTC)	Maximum spend in USD per day
Monthly cost	`monthly_cost_limit_usd`	Per calendar month (UTC)	Maximum spend in USD per month

All time boundaries are UTC. A daily limit resets at 00:00:00 UTC. A monthly limit resets on the first of each month.

Configuring per-user quotas

Set a quota for a specific user. All fields are optional — include only the limits you want to enforce.

PUT https://api.arbitex.ai/api/admin/users/{user_id}/quota
Authorization: Bearer arb_live_your-api-key-here
Content-Type: application/json

{
  "daily_token_limit": 100000,
  "monthly_cost_limit_usd": 50.00
}

This sets a daily cap of 100,000 tokens and a monthly cost cap of $50.00 USD. All other dimensions remain unlimited.

To read the current quota:

GET https://api.arbitex.ai/api/admin/users/{user_id}/quota
Authorization: Bearer arb_live_your-api-key-here

To remove all caps for a user:

DELETE https://api.arbitex.ai/api/admin/users/{user_id}/quota
Authorization: Bearer arb_live_your-api-key-here

Configuring per-group quotas

Group quotas use the same schema and work identically. Group quota limits apply to the aggregate usage of all members in the group — not per-member.

PUT https://api.arbitex.ai/api/admin/groups/{group_id}/quota
Authorization: Bearer arb_live_your-api-key-here
Content-Type: application/json

{
  "monthly_token_limit": 5000000,
  "monthly_cost_limit_usd": 500.00,
  "daily_request_limit": 1000
}

GET https://api.arbitex.ai/api/admin/groups/{group_id}/quota
DELETE https://api.arbitex.ai/api/admin/groups/{group_id}/quota

How enforcement works

When a request arrives, the gateway checks quotas in the payload analysis stage:

Load the user’s per-user quota (if any)
Load per-group quotas for every group the user belongs to
Compute the effective limit for each dimension — the minimum across user-level and all applicable group-level limits
For group quotas, check the aggregate group usage (total across all group members) against the group ceiling
If any dimension is exceeded, the request is blocked immediately

The effective limit is always the most restrictive value. If a user has a monthly cost cap of $100 and belongs to a group with a monthly cost cap of $500, the user’s $100 cap applies. If the group’s aggregate usage across all members reaches $500, the group cap blocks further requests from any member — even if an individual member has not reached their personal cap.

Budget enforcement is fail-fast: if the quota check fails, the DLP and policy evaluation stages are skipped entirely. The request never reaches a model provider.

What happens when a budget is exhausted

The gateway returns HTTP 429 with the following response body:

{
  "error": "quota_exceeded",
  "quota_type": "monthly_cost_usd",
  "limit": 50.0,
  "used": 50.12,
  "reset_at": "2026-04-01T00:00:00+00:00"
}

Field	Description
`error`	Always `"quota_exceeded"`
`quota_type`	Which dimension was exceeded: `daily_tokens`, `monthly_tokens`, `daily_requests`, `monthly_requests`, `daily_cost_usd`, or `monthly_cost_usd`
`limit`	The configured limit value
`used`	Current usage at the time of denial
`reset_at`	ISO 8601 timestamp when the limit resets (start of next day or month, UTC)

The response includes a Retry-After header with the same reset timestamp. Your application should not retry until after the reset time.

Audit log fields for budget denials

Every budget denial produces an audit log entry with:

Field	Value
`action_taken`	`BLOCK`
`match_reason`	`quota_exceeded`
`stage_latencies.quota_check_ms`	Time spent on the quota check
`stage_latencies.policy_eval_ms`	`0.0` (policy evaluation did not run)
`stage_latencies.provider_ms`	`0.0` (no provider call was made)

Budget denials also trigger a quota_exceeded webhook event for integration with alerting systems. The webhook payload includes the user ID, group ID (if the group cap was the limiting factor), the quota type, limit value, and current usage.

Cost-optimized routing

When you need budget-aware routing that selects the cheapest model for a request, configure cost-optimized routing through the cost routing admin API.

Tier assignments

Assign models to capability tiers. Each tier groups models with similar capability levels but potentially different costs.

POST https://api.arbitex.ai/api/admin/cost-routing/tiers/assign
Authorization: Bearer arb_live_your-api-key-here
Content-Type: application/json

{
  "model_id": "gpt-4o-mini",
  "provider": "openai",
  "tier": "standard",
  "input_cost_per_1k": 0.15,
  "output_cost_per_1k": 0.60
}

List all tier assignments:

GET https://api.arbitex.ai/api/admin/cost-routing/tiers

Optimization strategies

When selecting a model for cost-optimized routing, specify an optimization strategy:

Strategy	Behavior
`cheapest_combined`	Select the model with the lowest combined input + output cost per 1K tokens. This is the default.
`cheapest_input`	Minimize input cost per 1K tokens.
`cheapest_output`	Minimize output cost per 1K tokens.
`weighted`	Weighted random selection proportional to each model’s configured weight. Use this to distribute load across models.
`lowest_latency`	Select the model with the lowest observed latency.
`balanced`	Weighted combination of cost and latency.

Preview the selection for a tier:

POST https://api.arbitex.ai/api/admin/cost-routing/select
Authorization: Bearer arb_live_your-api-key-here
Content-Type: application/json

{
  "tier": "standard",
  "optimize": "cheapest_combined"
}

{
  "model_id": "gpt-4o-mini",
  "provider": "openai",
  "tier": "standard",
  "input_cost_per_1k": 0.15,
  "output_cost_per_1k": 0.60,
  "reason": "Selected gpt-4o-mini from tier 'standard' with lowest combined cost"
}

Interaction with Policy Engine ROUTE_TO

The Policy Engine’s ROUTE_TO action can specify a tier (route_to_tier: "haiku") that maps to cost-optimized routing. When a policy rule fires ROUTE_TO with a tier, the gateway selects the cheapest available model in that tier for the request’s provider. See Policy Engine overview — ROUTE_TO.