Rate limiting & quotas
Arbitex enforces per-user request rate limits using a sliding window algorithm. Limits apply to all API endpoints except the health check (/health) and OPTIONS preflight requests. This guide explains how limits work, how to read the rate limit headers, how to handle 429 responses, and how enterprise and admin rate limit overrides operate.
How rate limiting works
Section titled “How rate limiting works”The rate limiter uses a 60-second sliding window. For each request, it counts how many requests the client has made in the past 60 seconds and checks that count against the applicable limit. If the count exceeds the limit, the request is rejected with HTTP 429 Too Many Requests.
Client identification:
- Authenticated requests: Rate limits are tracked per user UUID, extracted from the JWT
subclaim. Each user has their own independent request budget. - Unauthenticated requests and all
/api/auth/endpoints: Rate limits are tracked per IP address. Auth endpoints always use IP-based limiting regardless of whether a valid JWT is present.
Backend storage: When REDIS_URL is configured, the sliding window is backed by Redis sorted sets for consistent enforcement across multiple API pods. When Redis is unavailable, the limiter falls back to an in-memory window per pod (degraded mode — limits are not shared across pods in this state).
Rate limit tiers
Section titled “Rate limit tiers”Endpoints are grouped into tiers with different per-minute limits. More specific matches take precedence over general ones.
| Tier | Endpoint match | Default limit | Notes |
|---|---|---|---|
auth | /api/auth/login, /api/auth/register, /api/auth/refresh | 10 RPM | Always IP-based, not user-based |
share | /api/conversations/shared/ | 30 RPM | Public share access |
chat | POST /api/conversations/{id}/messages | 60 RPM | Applies per user |
dlp_test | POST /api/admin/dlp-rules/test | 10 RPM | High-cost DLP evaluation |
general | All other endpoints | Configurable via RATE_LIMIT_REQUESTS_PER_MINUTE | Per user |
The general limit is set by the RATE_LIMIT_REQUESTS_PER_MINUTE environment variable. If the variable is not set, a compiled-in default applies.
Tier matching precedence
Section titled “Tier matching precedence”The limiter evaluates tiers in this order (first match wins):
- Method + regex — e.g.,
POST re:^/api/conversations/[^/]+/messages$(highest specificity) - Method + exact path — e.g.,
POST /api/admin/dlp-rules/test - Method + prefix — longest prefix wins
- Path-only exact match
- Path-only prefix — longest prefix wins
- General limit (fallback)
Rate limit headers
Section titled “Rate limit headers”Every response (including successful ones) includes three rate limit headers:
| Header | Value | Description |
|---|---|---|
X-RateLimit-Limit | integer | The applicable limit for the matched tier (requests per 60s) |
X-RateLimit-Remaining | integer | Requests remaining in the current window |
X-RateLimit-Reset | Unix timestamp | When the current window resets (epoch seconds) |
Example response headers:
X-RateLimit-Limit: 60X-RateLimit-Remaining: 42X-RateLimit-Reset: 1741565460Use X-RateLimit-Remaining to pace requests proactively before hitting the limit.
429 Too Many Requests
Section titled “429 Too Many Requests”When a request exceeds the limit, the response is HTTP 429:
HTTP/1.1 429 Too Many RequestsContent-Type: application/jsonRetry-After: 23X-RateLimit-Limit: 60X-RateLimit-Remaining: 0X-RateLimit-Reset: 1741565460{ "detail": "Rate limit exceeded", "tier": "chat"}Retry-After is the number of seconds to wait before retrying. It is derived from X-RateLimit-Reset - current_time, with a minimum of 1 second.
tier identifies which rate limit tier was exceeded — "auth", "share", "chat", "dlp_test", or "general". Use this to distinguish a chat throttle from an auth lockout.
Handling 429 in your client
Section titled “Handling 429 in your client”A basic retry-with-backoff pattern using the Retry-After header:
import timeimport requests
def api_call_with_retry(url: str, headers: dict, max_retries: int = 3) -> requests.Response: for attempt in range(max_retries): response = requests.get(url, headers=headers) if response.status_code == 429: retry_after = int(response.headers.get("Retry-After", 5)) if attempt < max_retries - 1: time.sleep(retry_after) continue return response return response # return final response after max retriesFor batch workloads, use X-RateLimit-Remaining to throttle proactively:
remaining = int(response.headers.get("X-RateLimit-Remaining", 10))if remaining < 5: reset_at = int(response.headers.get("X-RateLimit-Reset", 0)) wait = max(0, reset_at - int(time.time())) time.sleep(wait + 1) # +1 second safety marginAdmin rate limit bucket
Section titled “Admin rate limit bucket”By default, admin users (role: "admin") receive a higher RPM bucket (configured by RATE_LIMIT_ADMIN_RPM) rather than a blanket exemption. This gives admin tooling elevated headroom for dashboard queries, bulk operations, and audit exports while still enforcing finite limits.
To grant full exemption to admins, set RATE_LIMIT_ADMIN_EXEMPT=true. When set, admin requests bypass the rate limiter entirely. This is disabled by default.
Enterprise custom RPM
Section titled “Enterprise custom RPM”Enterprise-tier orgs (enterprise_saas or enterprise_outpost plans) can have a custom per-org rate limit configured in the enterprise_entitlements table via the custom_rate_limit_rpm column. When set:
- The custom RPM overrides the general limit for all non-auth endpoints for all users in that org.
- Auth endpoints continue to use their fixed IP-based limit (10 RPM) regardless of the enterprise override.
- The override applies per user (each user in the org gets the full custom RPM allowance).
- The override value is cached per user for 5 minutes (
ENTERPRISE_RPM_CACHE_TTL = 300s) to avoid a database round-trip on every request. Changes to the entitlement take effect within 5 minutes.
Contact your Arbitex account team to adjust the enterprise custom RPM for your organization.
Custom tier configuration
Section titled “Custom tier configuration”The RATE_LIMIT_TIERS environment variable allows operators to add or override per-endpoint limits without modifying the default tier map. The value must be a JSON object where keys are tier expressions and values are positive integers (requests per minute).
Tier key formats:
"/api/path" → matches any method, exact path or prefix"POST /api/path" → matches POST only, exact path or prefix"POST re:^/api/path/[^/]+$" → matches POST only, regex pathExample: Tighten the DLP test limit to 5 RPM and add a stricter limit on a custom analytics endpoint:
RATE_LIMIT_TIERS='{"POST /api/admin/dlp-rules/test": 5, "/api/analytics": 20}'Custom tiers are merged with the defaults. To override a default, use the same key expression. Custom tiers take effect at startup — a restart is required to apply changes.
Health check and exempt paths
Section titled “Health check and exempt paths”The /health endpoint is always exempt from rate limiting. OPTIONS requests (CORS preflight) are also exempt.
Environment variable reference
Section titled “Environment variable reference”| Variable | Default | Description |
|---|---|---|
RATE_LIMIT_REQUESTS_PER_MINUTE | (compiled default) | General limit for all non-tiered endpoints |
RATE_LIMIT_ADMIN_EXEMPT | false | Set true to fully exempt admin users from rate limiting |
RATE_LIMIT_ADMIN_RPM | (compiled default) | RPM bucket for admin users when not fully exempt |
RATE_LIMIT_TIERS | {} | JSON object of additional or overriding tier limits |
REDIS_URL | — | Redis connection URL for multi-pod rate limit consistency |
Observability
Section titled “Observability”Rate limit events are logged at WARNING level when a request is rejected:
{ "event": "rate_limit_exceeded", "client_key": "user:3fa85f64-...", "path": "/api/conversations/abc/messages", "limit": 60, "tier": "chat"}For aggregate monitoring, alert on HTTP 429 response rate from your load balancer or API gateway. A sustained 429 rate from a single client_key may indicate a misconfigured client or a credential that needs rotation.
See also
Section titled “See also”- API key management — API key credentials and rotation
- Audit log export — Audit entries include
latency_msand request metadata - Admin operations guide — Quota and alert configuration for admins