Skip to content

Rate limiting & quotas

Arbitex enforces per-user request rate limits using a sliding window algorithm. Limits apply to all API endpoints except the health check (/health) and OPTIONS preflight requests. This guide explains how limits work, how to read the rate limit headers, how to handle 429 responses, and how enterprise and admin rate limit overrides operate.


The rate limiter uses a 60-second sliding window. For each request, it counts how many requests the client has made in the past 60 seconds and checks that count against the applicable limit. If the count exceeds the limit, the request is rejected with HTTP 429 Too Many Requests.

Client identification:

  • Authenticated requests: Rate limits are tracked per user UUID, extracted from the JWT sub claim. Each user has their own independent request budget.
  • Unauthenticated requests and all /api/auth/ endpoints: Rate limits are tracked per IP address. Auth endpoints always use IP-based limiting regardless of whether a valid JWT is present.

Backend storage: When REDIS_URL is configured, the sliding window is backed by Redis sorted sets for consistent enforcement across multiple API pods. When Redis is unavailable, the limiter falls back to an in-memory window per pod (degraded mode — limits are not shared across pods in this state).


Endpoints are grouped into tiers with different per-minute limits. More specific matches take precedence over general ones.

TierEndpoint matchDefault limitNotes
auth/api/auth/login, /api/auth/register, /api/auth/refresh10 RPMAlways IP-based, not user-based
share/api/conversations/shared/30 RPMPublic share access
chatPOST /api/conversations/{id}/messages60 RPMApplies per user
dlp_testPOST /api/admin/dlp-rules/test10 RPMHigh-cost DLP evaluation
generalAll other endpointsConfigurable via RATE_LIMIT_REQUESTS_PER_MINUTEPer user

The general limit is set by the RATE_LIMIT_REQUESTS_PER_MINUTE environment variable. If the variable is not set, a compiled-in default applies.

The limiter evaluates tiers in this order (first match wins):

  1. Method + regex — e.g., POST re:^/api/conversations/[^/]+/messages$ (highest specificity)
  2. Method + exact path — e.g., POST /api/admin/dlp-rules/test
  3. Method + prefix — longest prefix wins
  4. Path-only exact match
  5. Path-only prefix — longest prefix wins
  6. General limit (fallback)

Every response (including successful ones) includes three rate limit headers:

HeaderValueDescription
X-RateLimit-LimitintegerThe applicable limit for the matched tier (requests per 60s)
X-RateLimit-RemainingintegerRequests remaining in the current window
X-RateLimit-ResetUnix timestampWhen the current window resets (epoch seconds)

Example response headers:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1741565460

Use X-RateLimit-Remaining to pace requests proactively before hitting the limit.


When a request exceeds the limit, the response is HTTP 429:

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 23
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1741565460
{
"detail": "Rate limit exceeded",
"tier": "chat"
}

Retry-After is the number of seconds to wait before retrying. It is derived from X-RateLimit-Reset - current_time, with a minimum of 1 second.

tier identifies which rate limit tier was exceeded — "auth", "share", "chat", "dlp_test", or "general". Use this to distinguish a chat throttle from an auth lockout.

A basic retry-with-backoff pattern using the Retry-After header:

import time
import requests
def api_call_with_retry(url: str, headers: dict, max_retries: int = 3) -> requests.Response:
for attempt in range(max_retries):
response = requests.get(url, headers=headers)
if response.status_code == 429:
retry_after = int(response.headers.get("Retry-After", 5))
if attempt < max_retries - 1:
time.sleep(retry_after)
continue
return response
return response # return final response after max retries

For batch workloads, use X-RateLimit-Remaining to throttle proactively:

remaining = int(response.headers.get("X-RateLimit-Remaining", 10))
if remaining < 5:
reset_at = int(response.headers.get("X-RateLimit-Reset", 0))
wait = max(0, reset_at - int(time.time()))
time.sleep(wait + 1) # +1 second safety margin

By default, admin users (role: "admin") receive a higher RPM bucket (configured by RATE_LIMIT_ADMIN_RPM) rather than a blanket exemption. This gives admin tooling elevated headroom for dashboard queries, bulk operations, and audit exports while still enforcing finite limits.

To grant full exemption to admins, set RATE_LIMIT_ADMIN_EXEMPT=true. When set, admin requests bypass the rate limiter entirely. This is disabled by default.


Enterprise-tier orgs (enterprise_saas or enterprise_outpost plans) can have a custom per-org rate limit configured in the enterprise_entitlements table via the custom_rate_limit_rpm column. When set:

  • The custom RPM overrides the general limit for all non-auth endpoints for all users in that org.
  • Auth endpoints continue to use their fixed IP-based limit (10 RPM) regardless of the enterprise override.
  • The override applies per user (each user in the org gets the full custom RPM allowance).
  • The override value is cached per user for 5 minutes (ENTERPRISE_RPM_CACHE_TTL = 300s) to avoid a database round-trip on every request. Changes to the entitlement take effect within 5 minutes.

Contact your Arbitex account team to adjust the enterprise custom RPM for your organization.


The RATE_LIMIT_TIERS environment variable allows operators to add or override per-endpoint limits without modifying the default tier map. The value must be a JSON object where keys are tier expressions and values are positive integers (requests per minute).

Tier key formats:

"/api/path" → matches any method, exact path or prefix
"POST /api/path" → matches POST only, exact path or prefix
"POST re:^/api/path/[^/]+$" → matches POST only, regex path

Example: Tighten the DLP test limit to 5 RPM and add a stricter limit on a custom analytics endpoint:

Terminal window
RATE_LIMIT_TIERS='{"POST /api/admin/dlp-rules/test": 5, "/api/analytics": 20}'

Custom tiers are merged with the defaults. To override a default, use the same key expression. Custom tiers take effect at startup — a restart is required to apply changes.


The /health endpoint is always exempt from rate limiting. OPTIONS requests (CORS preflight) are also exempt.


VariableDefaultDescription
RATE_LIMIT_REQUESTS_PER_MINUTE(compiled default)General limit for all non-tiered endpoints
RATE_LIMIT_ADMIN_EXEMPTfalseSet true to fully exempt admin users from rate limiting
RATE_LIMIT_ADMIN_RPM(compiled default)RPM bucket for admin users when not fully exempt
RATE_LIMIT_TIERS{}JSON object of additional or overriding tier limits
REDIS_URLRedis connection URL for multi-pod rate limit consistency

Rate limit events are logged at WARNING level when a request is rejected:

{
"event": "rate_limit_exceeded",
"client_key": "user:3fa85f64-...",
"path": "/api/conversations/abc/messages",
"limit": 60,
"tier": "chat"
}

For aggregate monitoring, alert on HTTP 429 response rate from your load balancer or API gateway. A sustained 429 rate from a single client_key may indicate a misconfigured client or a credential that needs rotation.