Skip to content

Billing and metering

Arbitex Gateway tracks token usage and cost for every request. You can set per-user and per-group budget caps across three dimensions — tokens, request count, and dollar cost — each with daily and monthly variants. When a budget is exhausted, the gateway blocks the request before it reaches the model provider.

Budget enforcement runs in the payload analysis stage of the intake pipeline, alongside DLP inspection. A request blocked by a budget cap does not consume provider tokens and does not produce a DLP finding or policy evaluation — but it does produce an audit log entry.


Each quota configuration supports six independent limits. Set any combination. A null value means unlimited for that dimension.

DimensionFieldScopeDescription
Daily tokensdaily_token_limitPer calendar day (UTC)Maximum input_tokens + output_tokens per day
Monthly tokensmonthly_token_limitPer calendar month (UTC)Maximum tokens per month
Daily requestsdaily_request_limitPer calendar day (UTC)Maximum request count per day
Monthly requestsmonthly_request_limitPer calendar month (UTC)Maximum request count per month
Daily costdaily_cost_limit_usdPer calendar day (UTC)Maximum spend in USD per day
Monthly costmonthly_cost_limit_usdPer calendar month (UTC)Maximum spend in USD per month

All time boundaries are UTC. A daily limit resets at 00:00:00 UTC. A monthly limit resets on the first of each month.


Set a quota for a specific user. All fields are optional — include only the limits you want to enforce.

Terminal window
PUT https://api.arbitex.ai/api/admin/users/{user_id}/quota
Authorization: Bearer arb_live_your-api-key-here
Content-Type: application/json
{
"daily_token_limit": 100000,
"monthly_cost_limit_usd": 50.00
}

This sets a daily cap of 100,000 tokens and a monthly cost cap of $50.00 USD. All other dimensions remain unlimited.

To read the current quota:

Terminal window
GET https://api.arbitex.ai/api/admin/users/{user_id}/quota
Authorization: Bearer arb_live_your-api-key-here

To remove all caps for a user:

Terminal window
DELETE https://api.arbitex.ai/api/admin/users/{user_id}/quota
Authorization: Bearer arb_live_your-api-key-here

Group quotas use the same schema and work identically. Group quota limits apply to the aggregate usage of all members in the group — not per-member.

Terminal window
PUT https://api.arbitex.ai/api/admin/groups/{group_id}/quota
Authorization: Bearer arb_live_your-api-key-here
Content-Type: application/json
{
"monthly_token_limit": 5000000,
"monthly_cost_limit_usd": 500.00,
"daily_request_limit": 1000
}
Terminal window
GET https://api.arbitex.ai/api/admin/groups/{group_id}/quota
DELETE https://api.arbitex.ai/api/admin/groups/{group_id}/quota

When a request arrives, the gateway checks quotas in the payload analysis stage:

  1. Load the user’s per-user quota (if any)
  2. Load per-group quotas for every group the user belongs to
  3. Compute the effective limit for each dimension — the minimum across user-level and all applicable group-level limits
  4. For group quotas, check the aggregate group usage (total across all group members) against the group ceiling
  5. If any dimension is exceeded, the request is blocked immediately

The effective limit is always the most restrictive value. If a user has a monthly cost cap of $100 and belongs to a group with a monthly cost cap of $500, the user’s $100 cap applies. If the group’s aggregate usage across all members reaches $500, the group cap blocks further requests from any member — even if an individual member has not reached their personal cap.

Budget enforcement is fail-fast: if the quota check fails, the DLP and policy evaluation stages are skipped entirely. The request never reaches a model provider.


The gateway returns HTTP 429 with the following response body:

{
"error": "quota_exceeded",
"quota_type": "monthly_cost_usd",
"limit": 50.0,
"used": 50.12,
"reset_at": "2026-04-01T00:00:00+00:00"
}
FieldDescription
errorAlways "quota_exceeded"
quota_typeWhich dimension was exceeded: daily_tokens, monthly_tokens, daily_requests, monthly_requests, daily_cost_usd, or monthly_cost_usd
limitThe configured limit value
usedCurrent usage at the time of denial
reset_atISO 8601 timestamp when the limit resets (start of next day or month, UTC)

The response includes a Retry-After header with the same reset timestamp. Your application should not retry until after the reset time.


Every budget denial produces an audit log entry with:

FieldValue
action_takenBLOCK
match_reasonquota_exceeded
stage_latencies.quota_check_msTime spent on the quota check
stage_latencies.policy_eval_ms0.0 (policy evaluation did not run)
stage_latencies.provider_ms0.0 (no provider call was made)

Budget denials also trigger a quota_exceeded webhook event for integration with alerting systems. The webhook payload includes the user ID, group ID (if the group cap was the limiting factor), the quota type, limit value, and current usage.


When you need budget-aware routing that selects the cheapest model for a request, configure cost-optimized routing through the cost routing admin API.

Assign models to capability tiers. Each tier groups models with similar capability levels but potentially different costs.

Terminal window
POST https://api.arbitex.ai/api/admin/cost-routing/tiers/assign
Authorization: Bearer arb_live_your-api-key-here
Content-Type: application/json
{
"model_id": "gpt-4o-mini",
"provider": "openai",
"tier": "standard",
"input_cost_per_1k": 0.15,
"output_cost_per_1k": 0.60
}

List all tier assignments:

Terminal window
GET https://api.arbitex.ai/api/admin/cost-routing/tiers

When selecting a model for cost-optimized routing, specify an optimization strategy:

StrategyBehavior
cheapest_combinedSelect the model with the lowest combined input + output cost per 1K tokens. This is the default.
cheapest_inputMinimize input cost per 1K tokens.
cheapest_outputMinimize output cost per 1K tokens.
weightedWeighted random selection proportional to each model’s configured weight. Use this to distribute load across models.
lowest_latencySelect the model with the lowest observed latency.
balancedWeighted combination of cost and latency.

Preview the selection for a tier:

Terminal window
POST https://api.arbitex.ai/api/admin/cost-routing/select
Authorization: Bearer arb_live_your-api-key-here
Content-Type: application/json
{
"tier": "standard",
"optimize": "cheapest_combined"
}
{
"model_id": "gpt-4o-mini",
"provider": "openai",
"tier": "standard",
"input_cost_per_1k": 0.15,
"output_cost_per_1k": 0.60,
"reason": "Selected gpt-4o-mini from tier 'standard' with lowest combined cost"
}

The Policy Engine’s ROUTE_TO action can specify a tier (route_to_tier: "haiku") that maps to cost-optimized routing. When a policy rule fires ROUTE_TO with a tier, the gateway selects the cheapest available model in that tier for the request’s provider. See Policy Engine overview — ROUTE_TO.


  • Policy Engine overview — How budget enforcement fits in the evaluation flow
  • Routing — Budget-based routing and cost-optimized mode
  • Audit Log — How budget denial entries appear in audit exports
  • API reference — Rate limiting headers and error codes