Rate limiting architecture
Arbitex enforces rate limits at the ASGI middleware layer before requests reach application logic. This document covers how the sliding window algorithm works, how client identity is resolved, the per-endpoint tier table, OAuth M2M tiers, enterprise overrides, Redis-backed storage, and all configuration options.
Source:
backend/app/middleware/rate_limit.py
Architecture overview
Section titled “Architecture overview”Incoming HTTP request │ ▼┌──────────────────────────────────────────────────────┐│ RateLimitMiddleware (ASGI) ││ ││ 1. Exempt check (/health, OPTIONS → pass through) ││ 2. Extract client identity ││ ├─ M2M JWT → oauth:{client_id} ││ ├─ User JWT → user:{uuid} ││ └─ Unauthenticated → ip:<address> ││ 3. Resolve limit (tier → enterprise override) ││ 4. Check sliding window (Redis → in-memory) ││ 5. Allow or 429 │└──────────────────────────────────────────────────────┘ │ ▼ Application handlerEvery response (allowed or denied) receives rate limit headers:
X-RateLimit-Limit: 60X-RateLimit-Remaining: 42X-RateLimit-Reset: 1741789380Client identity resolution
Section titled “Client identity resolution”The middleware identifies clients in this priority order:
- OAuth M2M JWT (
token_type=m2m): Bucket keyoauth:{client_id}. Rate limited under a separate M2M tier system. Therate_limit_tierclaim is read directly from the token — no database lookup per request. - User JWT (
token_type=useror standard): Bucket keyuser:{uuid}(from JWTsubclaim). - Unauthenticated (no valid JWT or expired token): Bucket key
ip:<client-ip>.
For authentication endpoints (/api/auth/*) a special rule applies — see Login rate limiting (per-email) below.
Sliding window algorithm
Section titled “Sliding window algorithm”All rate limits use a sliding window over a 60-second interval. The window is implemented differently depending on whether Redis is available:
Redis-backed (production)
Section titled “Redis-backed (production)”Uses Redis sorted sets (ZADD / ZRANGEBYSCORE / ZCARD pipeline):
- Remove entries older than
now - 60sfrom the sorted set. - Count remaining entries (before adding the new request).
- If count ≥ limit: reject the request and remove the just-added entry.
- Otherwise: add a new member with score = current timestamp.
- Set key TTL to
window_seconds + 60sbuffer.
Redis key format: rl:{client_key} (e.g. rl:user:abc-123, rl:ip:10.0.0.1).
Multi-pod deployments share state through Redis, ensuring consistent limits across all Platform instances.
In-memory fallback
Section titled “In-memory fallback”When Redis is unavailable, the middleware falls back to an in-process sorted list of timestamps per client key. This maintains correctness within a single process but does not share state across multiple pods — each pod enforces its own limit independently.
An INFO log is emitted when Redis becomes unavailable:
WARN outpost.rate_limit — Rate limiter Redis unavailable — falling back to in-memory sliding windowExpired in-memory entries are purged every 60 seconds to prevent unbounded memory growth.
Default rate limit tiers
Section titled “Default rate limit tiers”All limits are requests per minute (RPM) within a 60-second sliding window.
| Path / Pattern | Method | Limit (RPM) | Notes |
|---|---|---|---|
| General (all authenticated users) | Any | 60 | Default; configurable via RATE_LIMIT_REQUESTS_PER_MINUTE |
/api/auth/login | Any | 100 | Per-IP backstop only — primary limit is per-email (see below) |
/api/auth/register | Any | 10 | Per-IP |
/api/auth/refresh | Any | 30 | Per-IP |
/api/conversations/shared/ | Any | 30 | Per user/IP |
POST /api/conversations/{id}/messages | POST | 60 | Per user (regex match) |
POST /api/admin/dlp-rules/test | POST | 10 | High-cost endpoint (A2-RATE-01) |
| Admin users (non-exempt) | Any | 600 | Elevated bucket, not unlimited |
Tier matching precedence (highest to lowest):
- Method + regex pattern (highest specificity)
- Method + exact path
- Method + path prefix (longest wins)
- Path-only exact match
- Path-only prefix match (longest wins)
- General limit (fallback)
Custom tier overrides via environment variable
Section titled “Custom tier overrides via environment variable”You can override or extend tiers at deployment time via RATE_LIMIT_TIERS. Set it to a JSON object:
RATE_LIMIT_TIERS='{"POST /api/conversations/{id}/messages": 120, "/api/admin/": 200}'Custom tiers are merged with (and override) the defaults. Values must be positive integers.
Login rate limiting (per-email)
Section titled “Login rate limiting (per-email)”Login attempts at POST /api/auth/login use a per-email primary limit rather than per-IP, because enterprise customers often route hundreds of users through the same NAT/proxy IP.
How it works
Section titled “How it works”- The middleware reads the
email(orusername) field from the JSON request body. - A primary check is performed against the bucket key
login:{email_lowercased}with a limit of 10 attempts per email per minute. - If the email limit is not exceeded, a secondary per-IP check is performed against the normal IP bucket with the
/api/auth/logintier limit (100 RPM) as a backstop against credential-stuffing bots probing many different accounts from the same IP.
| Check | Bucket key | Limit | Purpose |
|---|---|---|---|
| Primary (email) | login:{email} | 10 RPM | Brute-force protection per account |
| Secondary (IP) | ip:{address} | 100 RPM | Backstop against mass credential-stuffing |
If either check fails, a 429 is returned immediately. The ASGI body is buffered and replayed so the downstream login handler receives it unchanged.
When the email field cannot be extracted (malformed JSON, etc.), the middleware falls back to per-IP limiting using the standard /api/auth/login tier.
Log events
Section titled “Log events”WARN rate_limit — Login rate limit exceeded (per-email) client_key=login:user@example.com limit=10 tier=auth_emailOAuth M2M rate limiting
Section titled “OAuth M2M rate limiting”OAuth M2M clients (service accounts using client credentials flow) are rate-limited separately from human users. Their bucket key is oauth:{client_id}, isolated from user and IP buckets.
Tier table
Section titled “Tier table”The rate_limit_tier claim is embedded in the M2M JWT at token issuance — no database lookup occurs per request.
| Tier | RPM | Use case |
|---|---|---|
standard | 1,000 | Default for new M2M clients |
premium | 5,000 | High-volume integrations |
unlimited | No limit | Fully exempt from rate limiting |
If the JWT carries an unrecognised tier value, the middleware applies the standard default (1,000 RPM).
Unlimited tier exemption
Section titled “Unlimited tier exemption”When rate_limit_tier=unlimited, the request passes through immediately without any sliding window check. This is intended for internal platform-to-platform integrations where limiting would interfere with system operations.
OAuth endpoint 429 response format
Section titled “OAuth endpoint 429 response format”For requests to /api/oauth/* paths, the 429 response uses RFC 6749 format:
{ "error": "rate_limit_exceeded", "error_description": "Rate limit exceeded. Retry after 23 seconds."}For all other paths with M2M tokens, the standard format is used:
{ "error": "rate_limit_exceeded", "tier": "m2m", "retry_after": 23}Enterprise custom RPM
Section titled “Enterprise custom RPM”Enterprise tier organisations can have a custom per-minute limit configured in the enterprise_entitlements table. This overrides the default tier limit for all non-auth endpoints for authenticated users in that organisation.
How the override works
Section titled “How the override works”- For authenticated requests (
user:{uuid}bucket, non-auth paths), the middleware queriesenterprise_entitlements.custom_rate_limit_rpmfor the user’s organisation. - The result is cached in-process for 300 seconds per user UUID to avoid a database round-trip on every request.
- When a custom RPM is set, it replaces the resolved tier limit (including admin RPM, if applicable).
- If no custom RPM is set (null), the standard tier limit applies.
Cache invalidation
Section titled “Cache invalidation”To apply a new custom RPM immediately (without waiting 300 seconds for cache expiry), call the cache invalidation function. This is triggered automatically when updating enterprise_entitlements via the admin API.
Configuration
Section titled “Configuration”Set via the Platform admin panel at Admin → Enterprise → Custom Rate Limit, or directly via the API:
PATCH /api/admin/enterprise/entitlementsContent-Type: application/json
{ "custom_rate_limit_rpm": 10000 }Admin user rate limiting
Section titled “Admin user rate limiting”Admin users receive elevated but finite rate limits rather than a blanket exemption.
| Setting | Default | Description |
|---|---|---|
RATE_LIMIT_ADMIN_EXEMPT | false | Set true to fully exempt admins from rate limiting |
RATE_LIMIT_ADMIN_RPM (env) | 600 | Admin user bucket size when not fully exempt |
When RATE_LIMIT_ADMIN_EXEMPT=true, admin requests pass through the middleware without any sliding window check. This is not recommended for production — use the elevated admin RPM instead.
The admin RPM (600 by default) does not override auth endpoint tiers — login, register, and refresh endpoints always use per-IP/per-email limits regardless of the user’s role.
HTTP response headers
Section titled “HTTP response headers”Every response (allowed or denied) includes these headers:
| Header | Value | Description |
|---|---|---|
X-RateLimit-Limit | Integer | The limit in effect for this request |
X-RateLimit-Remaining | Integer | Requests remaining in current window |
X-RateLimit-Reset | Unix timestamp | When the window resets |
On a 429 response, the Retry-After header is also included:
HTTP/1.1 429 Too Many RequestsRetry-After: 23X-RateLimit-Limit: 60X-RateLimit-Remaining: 0X-RateLimit-Reset: 1741789380Content-Type: application/json
{ "error": "rate_limit_exceeded", "tier": "general", "retry_after": 23}Exempt paths
Section titled “Exempt paths”The following are always exempt from rate limiting:
| Path/method | Reason |
|---|---|
GET /health | Health probe — must always respond |
OPTIONS * | CORS preflight — must not be blocked |
Configuration reference
Section titled “Configuration reference”| Environment variable | Default | Description |
|---|---|---|
RATE_LIMIT_REQUESTS_PER_MINUTE | 60 | Default limit for all authenticated users |
RATE_LIMIT_ADMIN_EXEMPT | false | Fully exempt admin users from rate limiting |
RATE_LIMIT_ADMIN_RPM | 600 | Admin user RPM when not fully exempt |
RATE_LIMIT_TIERS | {} | JSON object overriding or extending default per-path tiers |
REDIS_URL | "" | Redis connection URL. Rate limiter uses a separate client from usage counters. When unset, in-memory fallback is used. |
Related docs
Section titled “Related docs”- OAuth M2M API guide — M2M client setup, token issuance, tier configuration
- API reference: OAuth clients — CRUD and rate_limit_tier enum reference
- Usage dashboard — Per-user request volume visualisation
- Rate limiting guide (user-facing) — End-user guide to handling 429 responses