Skip to content

Rate limiting architecture

Arbitex enforces rate limits at the ASGI middleware layer before requests reach application logic. This document covers how the sliding window algorithm works, how client identity is resolved, the per-endpoint tier table, OAuth M2M tiers, enterprise overrides, Redis-backed storage, and all configuration options.

Source: backend/app/middleware/rate_limit.py


Incoming HTTP request
┌──────────────────────────────────────────────────────┐
│ RateLimitMiddleware (ASGI) │
│ │
│ 1. Exempt check (/health, OPTIONS → pass through) │
│ 2. Extract client identity │
│ ├─ M2M JWT → oauth:{client_id} │
│ ├─ User JWT → user:{uuid} │
│ └─ Unauthenticated → ip:<address> │
│ 3. Resolve limit (tier → enterprise override) │
│ 4. Check sliding window (Redis → in-memory) │
│ 5. Allow or 429 │
└──────────────────────────────────────────────────────┘
Application handler

Every response (allowed or denied) receives rate limit headers:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1741789380

The middleware identifies clients in this priority order:

  1. OAuth M2M JWT (token_type=m2m): Bucket key oauth:{client_id}. Rate limited under a separate M2M tier system. The rate_limit_tier claim is read directly from the token — no database lookup per request.
  2. User JWT (token_type=user or standard): Bucket key user:{uuid} (from JWT sub claim).
  3. Unauthenticated (no valid JWT or expired token): Bucket key ip:<client-ip>.

For authentication endpoints (/api/auth/*) a special rule applies — see Login rate limiting (per-email) below.


All rate limits use a sliding window over a 60-second interval. The window is implemented differently depending on whether Redis is available:

Uses Redis sorted sets (ZADD / ZRANGEBYSCORE / ZCARD pipeline):

  1. Remove entries older than now - 60s from the sorted set.
  2. Count remaining entries (before adding the new request).
  3. If count ≥ limit: reject the request and remove the just-added entry.
  4. Otherwise: add a new member with score = current timestamp.
  5. Set key TTL to window_seconds + 60s buffer.

Redis key format: rl:{client_key} (e.g. rl:user:abc-123, rl:ip:10.0.0.1).

Multi-pod deployments share state through Redis, ensuring consistent limits across all Platform instances.

When Redis is unavailable, the middleware falls back to an in-process sorted list of timestamps per client key. This maintains correctness within a single process but does not share state across multiple pods — each pod enforces its own limit independently.

An INFO log is emitted when Redis becomes unavailable:

WARN outpost.rate_limit — Rate limiter Redis unavailable — falling back to in-memory sliding window

Expired in-memory entries are purged every 60 seconds to prevent unbounded memory growth.


All limits are requests per minute (RPM) within a 60-second sliding window.

Path / PatternMethodLimit (RPM)Notes
General (all authenticated users)Any60Default; configurable via RATE_LIMIT_REQUESTS_PER_MINUTE
/api/auth/loginAny100Per-IP backstop only — primary limit is per-email (see below)
/api/auth/registerAny10Per-IP
/api/auth/refreshAny30Per-IP
/api/conversations/shared/Any30Per user/IP
POST /api/conversations/{id}/messagesPOST60Per user (regex match)
POST /api/admin/dlp-rules/testPOST10High-cost endpoint (A2-RATE-01)
Admin users (non-exempt)Any600Elevated bucket, not unlimited

Tier matching precedence (highest to lowest):

  1. Method + regex pattern (highest specificity)
  2. Method + exact path
  3. Method + path prefix (longest wins)
  4. Path-only exact match
  5. Path-only prefix match (longest wins)
  6. General limit (fallback)

Custom tier overrides via environment variable

Section titled “Custom tier overrides via environment variable”

You can override or extend tiers at deployment time via RATE_LIMIT_TIERS. Set it to a JSON object:

Terminal window
RATE_LIMIT_TIERS='{"POST /api/conversations/{id}/messages": 120, "/api/admin/": 200}'

Custom tiers are merged with (and override) the defaults. Values must be positive integers.


Login attempts at POST /api/auth/login use a per-email primary limit rather than per-IP, because enterprise customers often route hundreds of users through the same NAT/proxy IP.

  1. The middleware reads the email (or username) field from the JSON request body.
  2. A primary check is performed against the bucket key login:{email_lowercased} with a limit of 10 attempts per email per minute.
  3. If the email limit is not exceeded, a secondary per-IP check is performed against the normal IP bucket with the /api/auth/login tier limit (100 RPM) as a backstop against credential-stuffing bots probing many different accounts from the same IP.
CheckBucket keyLimitPurpose
Primary (email)login:{email}10 RPMBrute-force protection per account
Secondary (IP)ip:{address}100 RPMBackstop against mass credential-stuffing

If either check fails, a 429 is returned immediately. The ASGI body is buffered and replayed so the downstream login handler receives it unchanged.

When the email field cannot be extracted (malformed JSON, etc.), the middleware falls back to per-IP limiting using the standard /api/auth/login tier.

WARN rate_limit — Login rate limit exceeded (per-email) client_key=login:user@example.com limit=10 tier=auth_email

OAuth M2M clients (service accounts using client credentials flow) are rate-limited separately from human users. Their bucket key is oauth:{client_id}, isolated from user and IP buckets.

The rate_limit_tier claim is embedded in the M2M JWT at token issuance — no database lookup occurs per request.

TierRPMUse case
standard1,000Default for new M2M clients
premium5,000High-volume integrations
unlimitedNo limitFully exempt from rate limiting

If the JWT carries an unrecognised tier value, the middleware applies the standard default (1,000 RPM).

When rate_limit_tier=unlimited, the request passes through immediately without any sliding window check. This is intended for internal platform-to-platform integrations where limiting would interfere with system operations.

For requests to /api/oauth/* paths, the 429 response uses RFC 6749 format:

{
"error": "rate_limit_exceeded",
"error_description": "Rate limit exceeded. Retry after 23 seconds."
}

For all other paths with M2M tokens, the standard format is used:

{
"error": "rate_limit_exceeded",
"tier": "m2m",
"retry_after": 23
}

Enterprise tier organisations can have a custom per-minute limit configured in the enterprise_entitlements table. This overrides the default tier limit for all non-auth endpoints for authenticated users in that organisation.

  1. For authenticated requests (user:{uuid} bucket, non-auth paths), the middleware queries enterprise_entitlements.custom_rate_limit_rpm for the user’s organisation.
  2. The result is cached in-process for 300 seconds per user UUID to avoid a database round-trip on every request.
  3. When a custom RPM is set, it replaces the resolved tier limit (including admin RPM, if applicable).
  4. If no custom RPM is set (null), the standard tier limit applies.

To apply a new custom RPM immediately (without waiting 300 seconds for cache expiry), call the cache invalidation function. This is triggered automatically when updating enterprise_entitlements via the admin API.

Set via the Platform admin panel at Admin → Enterprise → Custom Rate Limit, or directly via the API:

Terminal window
PATCH /api/admin/enterprise/entitlements
Content-Type: application/json
{ "custom_rate_limit_rpm": 10000 }

Admin users receive elevated but finite rate limits rather than a blanket exemption.

SettingDefaultDescription
RATE_LIMIT_ADMIN_EXEMPTfalseSet true to fully exempt admins from rate limiting
RATE_LIMIT_ADMIN_RPM (env)600Admin user bucket size when not fully exempt

When RATE_LIMIT_ADMIN_EXEMPT=true, admin requests pass through the middleware without any sliding window check. This is not recommended for production — use the elevated admin RPM instead.

The admin RPM (600 by default) does not override auth endpoint tiers — login, register, and refresh endpoints always use per-IP/per-email limits regardless of the user’s role.


Every response (allowed or denied) includes these headers:

HeaderValueDescription
X-RateLimit-LimitIntegerThe limit in effect for this request
X-RateLimit-RemainingIntegerRequests remaining in current window
X-RateLimit-ResetUnix timestampWhen the window resets

On a 429 response, the Retry-After header is also included:

HTTP/1.1 429 Too Many Requests
Retry-After: 23
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1741789380
Content-Type: application/json
{
"error": "rate_limit_exceeded",
"tier": "general",
"retry_after": 23
}

The following are always exempt from rate limiting:

Path/methodReason
GET /healthHealth probe — must always respond
OPTIONS *CORS preflight — must not be blocked

Environment variableDefaultDescription
RATE_LIMIT_REQUESTS_PER_MINUTE60Default limit for all authenticated users
RATE_LIMIT_ADMIN_EXEMPTfalseFully exempt admin users from rate limiting
RATE_LIMIT_ADMIN_RPM600Admin user RPM when not fully exempt
RATE_LIMIT_TIERS{}JSON object overriding or extending default per-path tiers
REDIS_URL""Redis connection URL. Rate limiter uses a separate client from usage counters. When unset, in-memory fallback is used.