Billing and metering
Arbitex Gateway tracks token usage and cost for every request. You can set per-user and per-group budget caps across three dimensions — tokens, request count, and dollar cost — each with daily and monthly variants. When a budget is exhausted, the gateway blocks the request before it reaches the model provider.
Budget enforcement runs in the payload analysis stage of the intake pipeline, alongside DLP inspection. A request blocked by a budget cap does not consume provider tokens and does not produce a DLP finding or policy evaluation — but it does produce an audit log entry.
Budget dimensions
Section titled “Budget dimensions”Each quota configuration supports six independent limits. Set any combination. A null value means unlimited for that dimension.
| Dimension | Field | Scope | Description |
|---|---|---|---|
| Daily tokens | daily_token_limit | Per calendar day (UTC) | Maximum input_tokens + output_tokens per day |
| Monthly tokens | monthly_token_limit | Per calendar month (UTC) | Maximum tokens per month |
| Daily requests | daily_request_limit | Per calendar day (UTC) | Maximum request count per day |
| Monthly requests | monthly_request_limit | Per calendar month (UTC) | Maximum request count per month |
| Daily cost | daily_cost_limit_usd | Per calendar day (UTC) | Maximum spend in USD per day |
| Monthly cost | monthly_cost_limit_usd | Per calendar month (UTC) | Maximum spend in USD per month |
All time boundaries are UTC. A daily limit resets at 00:00:00 UTC. A monthly limit resets on the first of each month.
Configuring per-user quotas
Section titled “Configuring per-user quotas”Set a quota for a specific user. All fields are optional — include only the limits you want to enforce.
PUT https://api.arbitex.ai/api/admin/users/{user_id}/quotaAuthorization: Bearer arb_live_your-api-key-hereContent-Type: application/json
{ "daily_token_limit": 100000, "monthly_cost_limit_usd": 50.00}This sets a daily cap of 100,000 tokens and a monthly cost cap of $50.00 USD. All other dimensions remain unlimited.
To read the current quota:
GET https://api.arbitex.ai/api/admin/users/{user_id}/quotaAuthorization: Bearer arb_live_your-api-key-hereTo remove all caps for a user:
DELETE https://api.arbitex.ai/api/admin/users/{user_id}/quotaAuthorization: Bearer arb_live_your-api-key-hereConfiguring per-group quotas
Section titled “Configuring per-group quotas”Group quotas use the same schema and work identically. Group quota limits apply to the aggregate usage of all members in the group — not per-member.
PUT https://api.arbitex.ai/api/admin/groups/{group_id}/quotaAuthorization: Bearer arb_live_your-api-key-hereContent-Type: application/json
{ "monthly_token_limit": 5000000, "monthly_cost_limit_usd": 500.00, "daily_request_limit": 1000}GET https://api.arbitex.ai/api/admin/groups/{group_id}/quotaDELETE https://api.arbitex.ai/api/admin/groups/{group_id}/quotaHow enforcement works
Section titled “How enforcement works”When a request arrives, the gateway checks quotas in the payload analysis stage:
- Load the user’s per-user quota (if any)
- Load per-group quotas for every group the user belongs to
- Compute the effective limit for each dimension — the minimum across user-level and all applicable group-level limits
- For group quotas, check the aggregate group usage (total across all group members) against the group ceiling
- If any dimension is exceeded, the request is blocked immediately
The effective limit is always the most restrictive value. If a user has a monthly cost cap of $100 and belongs to a group with a monthly cost cap of $500, the user’s $100 cap applies. If the group’s aggregate usage across all members reaches $500, the group cap blocks further requests from any member — even if an individual member has not reached their personal cap.
Budget enforcement is fail-fast: if the quota check fails, the DLP and policy evaluation stages are skipped entirely. The request never reaches a model provider.
What happens when a budget is exhausted
Section titled “What happens when a budget is exhausted”The gateway returns HTTP 429 with the following response body:
{ "error": "quota_exceeded", "quota_type": "monthly_cost_usd", "limit": 50.0, "used": 50.12, "reset_at": "2026-04-01T00:00:00+00:00"}| Field | Description |
|---|---|
error | Always "quota_exceeded" |
quota_type | Which dimension was exceeded: daily_tokens, monthly_tokens, daily_requests, monthly_requests, daily_cost_usd, or monthly_cost_usd |
limit | The configured limit value |
used | Current usage at the time of denial |
reset_at | ISO 8601 timestamp when the limit resets (start of next day or month, UTC) |
The response includes a Retry-After header with the same reset timestamp. Your application should not retry until after the reset time.
Audit log fields for budget denials
Section titled “Audit log fields for budget denials”Every budget denial produces an audit log entry with:
| Field | Value |
|---|---|
action_taken | BLOCK |
match_reason | quota_exceeded |
stage_latencies.quota_check_ms | Time spent on the quota check |
stage_latencies.policy_eval_ms | 0.0 (policy evaluation did not run) |
stage_latencies.provider_ms | 0.0 (no provider call was made) |
Budget denials also trigger a quota_exceeded webhook event for integration with alerting systems. The webhook payload includes the user ID, group ID (if the group cap was the limiting factor), the quota type, limit value, and current usage.
Cost-optimized routing
Section titled “Cost-optimized routing”When you need budget-aware routing that selects the cheapest model for a request, configure cost-optimized routing through the cost routing admin API.
Tier assignments
Section titled “Tier assignments”Assign models to capability tiers. Each tier groups models with similar capability levels but potentially different costs.
POST https://api.arbitex.ai/api/admin/cost-routing/tiers/assignAuthorization: Bearer arb_live_your-api-key-hereContent-Type: application/json
{ "model_id": "gpt-4o-mini", "provider": "openai", "tier": "standard", "input_cost_per_1k": 0.15, "output_cost_per_1k": 0.60}List all tier assignments:
GET https://api.arbitex.ai/api/admin/cost-routing/tiersOptimization strategies
Section titled “Optimization strategies”When selecting a model for cost-optimized routing, specify an optimization strategy:
| Strategy | Behavior |
|---|---|
cheapest_combined | Select the model with the lowest combined input + output cost per 1K tokens. This is the default. |
cheapest_input | Minimize input cost per 1K tokens. |
cheapest_output | Minimize output cost per 1K tokens. |
weighted | Weighted random selection proportional to each model’s configured weight. Use this to distribute load across models. |
lowest_latency | Select the model with the lowest observed latency. |
balanced | Weighted combination of cost and latency. |
Preview the selection for a tier:
POST https://api.arbitex.ai/api/admin/cost-routing/selectAuthorization: Bearer arb_live_your-api-key-hereContent-Type: application/json
{ "tier": "standard", "optimize": "cheapest_combined"}{ "model_id": "gpt-4o-mini", "provider": "openai", "tier": "standard", "input_cost_per_1k": 0.15, "output_cost_per_1k": 0.60, "reason": "Selected gpt-4o-mini from tier 'standard' with lowest combined cost"}Interaction with Policy Engine ROUTE_TO
Section titled “Interaction with Policy Engine ROUTE_TO”The Policy Engine’s ROUTE_TO action can specify a tier (route_to_tier: "haiku") that maps to cost-optimized routing. When a policy rule fires ROUTE_TO with a tier, the gateway selects the cheapest available model in that tier for the request’s provider. See Policy Engine overview — ROUTE_TO.
See also
Section titled “See also”- Policy Engine overview — How budget enforcement fits in the evaluation flow
- Routing — Budget-based routing and cost-optimized mode
- Audit Log — How budget denial entries appear in audit exports
- API reference — Rate limiting headers and error codes