Rate limiting & quotas

Arbitex enforces per-user request rate limits using a sliding window algorithm. Limits apply to all API endpoints except the health check (/health) and OPTIONS preflight requests. This guide explains how limits work, how to read the rate limit headers, how to handle 429 responses, and how enterprise and admin rate limit overrides operate.

How rate limiting works

The rate limiter uses a 60-second sliding window. For each request, it counts how many requests the client has made in the past 60 seconds and checks that count against the applicable limit. If the count exceeds the limit, the request is rejected with HTTP 429 Too Many Requests.

Client identification:

Authenticated requests: Rate limits are tracked per user UUID, extracted from the JWT sub claim. Each user has their own independent request budget.
Unauthenticated requests and all /api/auth/ endpoints: Rate limits are tracked per IP address. Auth endpoints always use IP-based limiting regardless of whether a valid JWT is present.

Backend storage: When REDIS_URL is configured, the sliding window is backed by Redis sorted sets for consistent enforcement across multiple API pods. When Redis is unavailable, the limiter falls back to an in-memory window per pod (degraded mode — limits are not shared across pods in this state).

Rate limit tiers

Endpoints are grouped into tiers with different per-minute limits. More specific matches take precedence over general ones.

Tier	Endpoint match	Default limit	Notes
`auth`	`/api/auth/login`, `/api/auth/register`, `/api/auth/refresh`	10 RPM	Always IP-based, not user-based
`share`	`/api/conversations/shared/`	30 RPM	Public share access
`chat`	`POST /api/conversations/{id}/messages`	60 RPM	Applies per user
`dlp_test`	`POST /api/admin/dlp-rules/test`	10 RPM	High-cost DLP evaluation
`general`	All other endpoints	Configurable via `RATE_LIMIT_REQUESTS_PER_MINUTE`	Per user

The general limit is set by the RATE_LIMIT_REQUESTS_PER_MINUTE environment variable. If the variable is not set, a compiled-in default applies.

Tier matching precedence

The limiter evaluates tiers in this order (first match wins):

Method + regex — e.g., POST re:^/api/conversations/[^/]+/messages$ (highest specificity)
Method + exact path — e.g., POST /api/admin/dlp-rules/test
Method + prefix — longest prefix wins
Path-only exact match
Path-only prefix — longest prefix wins
General limit (fallback)

Rate limit headers

Every response (including successful ones) includes three rate limit headers:

Header	Value	Description
`X-RateLimit-Limit`	integer	The applicable limit for the matched tier (requests per 60s)
`X-RateLimit-Remaining`	integer	Requests remaining in the current window
`X-RateLimit-Reset`	Unix timestamp	When the current window resets (epoch seconds)

Example response headers:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1741565460

Use X-RateLimit-Remaining to pace requests proactively before hitting the limit.

429 Too Many Requests

When a request exceeds the limit, the response is HTTP 429:

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 23
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1741565460

{
  "detail": "Rate limit exceeded",
  "tier": "chat"
}

Retry-After is the number of seconds to wait before retrying. It is derived from X-RateLimit-Reset - current_time, with a minimum of 1 second.

tier identifies which rate limit tier was exceeded — "auth", "share", "chat", "dlp_test", or "general". Use this to distinguish a chat throttle from an auth lockout.

Handling 429 in your client

A basic retry-with-backoff pattern using the Retry-After header:

import time
import requests

def api_call_with_retry(url: str, headers: dict, max_retries: int = 3) -> requests.Response:
    for attempt in range(max_retries):
        response = requests.get(url, headers=headers)
        if response.status_code == 429:
            retry_after = int(response.headers.get("Retry-After", 5))
            if attempt < max_retries - 1:
                time.sleep(retry_after)
                continue
        return response
    return response  # return final response after max retries

For batch workloads, use X-RateLimit-Remaining to throttle proactively:

remaining = int(response.headers.get("X-RateLimit-Remaining", 10))
if remaining < 5:
    reset_at = int(response.headers.get("X-RateLimit-Reset", 0))
    wait = max(0, reset_at - int(time.time()))
    time.sleep(wait + 1)  # +1 second safety margin

Admin rate limit bucket

By default, admin users (role: "admin") receive a higher RPM bucket (configured by RATE_LIMIT_ADMIN_RPM) rather than a blanket exemption. This gives admin tooling elevated headroom for dashboard queries, bulk operations, and audit exports while still enforcing finite limits.

To grant full exemption to admins, set RATE_LIMIT_ADMIN_EXEMPT=true. When set, admin requests bypass the rate limiter entirely. This is disabled by default.

Enterprise custom RPM

Enterprise-tier orgs (enterprise_saas or enterprise_outpost plans) can have a custom per-org rate limit configured in the enterprise_entitlements table via the custom_rate_limit_rpm column. When set:

The custom RPM overrides the general limit for all non-auth endpoints for all users in that org.
Auth endpoints continue to use their fixed IP-based limit (10 RPM) regardless of the enterprise override.
The override applies per user (each user in the org gets the full custom RPM allowance).
The override value is cached per user for 5 minutes (ENTERPRISE_RPM_CACHE_TTL = 300s) to avoid a database round-trip on every request. Changes to the entitlement take effect within 5 minutes.

Contact your Arbitex account team to adjust the enterprise custom RPM for your organization.

Custom tier configuration

The RATE_LIMIT_TIERS environment variable allows operators to add or override per-endpoint limits without modifying the default tier map. The value must be a JSON object where keys are tier expressions and values are positive integers (requests per minute).

Tier key formats:

"/api/path"                     → matches any method, exact path or prefix
"POST /api/path"                → matches POST only, exact path or prefix
"POST re:^/api/path/[^/]+$"    → matches POST only, regex path

Example: Tighten the DLP test limit to 5 RPM and add a stricter limit on a custom analytics endpoint:

RATE_LIMIT_TIERS='{"POST /api/admin/dlp-rules/test": 5, "/api/analytics": 20}'

Custom tiers are merged with the defaults. To override a default, use the same key expression. Custom tiers take effect at startup — a restart is required to apply changes.

Health check and exempt paths

The /health endpoint is always exempt from rate limiting. OPTIONS requests (CORS preflight) are also exempt.

Environment variable reference

Variable	Default	Description
`RATE_LIMIT_REQUESTS_PER_MINUTE`	(compiled default)	General limit for all non-tiered endpoints
`RATE_LIMIT_ADMIN_EXEMPT`	`false`	Set `true` to fully exempt admin users from rate limiting
`RATE_LIMIT_ADMIN_RPM`	(compiled default)	RPM bucket for admin users when not fully exempt
`RATE_LIMIT_TIERS`	`{}`	JSON object of additional or overriding tier limits
`REDIS_URL`	—	Redis connection URL for multi-pod rate limit consistency

Observability

Rate limit events are logged at WARNING level when a request is rejected:

{
  "event": "rate_limit_exceeded",
  "client_key": "user:3fa85f64-...",
  "path": "/api/conversations/abc/messages",
  "limit": 60,
  "tier": "chat"
}

For aggregate monitoring, alert on HTTP 429 response rate from your load balancer or API gateway. A sustained 429 rate from a single client_key may indicate a misconfigured client or a credential that needs rotation.