Rate limiting architecture

Arbitex enforces rate limits at the ASGI middleware layer before requests reach application logic. This document covers how the sliding window algorithm works, how client identity is resolved, the per-endpoint tier table, OAuth M2M tiers, enterprise overrides, Redis-backed storage, and all configuration options.

Source: backend/app/middleware/rate_limit.py

Architecture overview

Incoming HTTP request
        │
        ▼
┌──────────────────────────────────────────────────────┐
│              RateLimitMiddleware (ASGI)               │
│                                                      │
│  1. Exempt check  (/health, OPTIONS → pass through)  │
│  2. Extract client identity                          │
│     ├─ M2M JWT → oauth:{client_id}                   │
│     ├─ User JWT → user:{uuid}                        │
│     └─ Unauthenticated → ip:<address>                │
│  3. Resolve limit (tier → enterprise override)       │
│  4. Check sliding window (Redis → in-memory)         │
│  5. Allow or 429                                     │
└──────────────────────────────────────────────────────┘
        │
        ▼
  Application handler

Every response (allowed or denied) receives rate limit headers:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1741789380

Client identity resolution

The middleware identifies clients in this priority order:

OAuth M2M JWT (token_type=m2m): Bucket key oauth:{client_id}. Rate limited under a separate M2M tier system. The rate_limit_tier claim is read directly from the token — no database lookup per request.
User JWT (token_type=user or standard): Bucket key user:{uuid} (from JWT sub claim).
Unauthenticated (no valid JWT or expired token): Bucket key ip:<client-ip>.

For authentication endpoints (/api/auth/*) a special rule applies — see Login rate limiting (per-email) below.

Sliding window algorithm

All rate limits use a sliding window over a 60-second interval. The window is implemented differently depending on whether Redis is available:

Redis-backed (production)

Uses Redis sorted sets (ZADD / ZRANGEBYSCORE / ZCARD pipeline):

Remove entries older than now - 60s from the sorted set.
Count remaining entries (before adding the new request).
If count ≥ limit: reject the request and remove the just-added entry.
Otherwise: add a new member with score = current timestamp.
Set key TTL to window_seconds + 60s buffer.

Redis key format: rl:{client_key} (e.g. rl:user:abc-123, rl:ip:10.0.0.1).

Multi-pod deployments share state through Redis, ensuring consistent limits across all Platform instances.

In-memory fallback

When Redis is unavailable, the middleware falls back to an in-process sorted list of timestamps per client key. This maintains correctness within a single process but does not share state across multiple pods — each pod enforces its own limit independently.

An INFO log is emitted when Redis becomes unavailable:

WARN  outpost.rate_limit — Rate limiter Redis unavailable — falling back to in-memory sliding window

Expired in-memory entries are purged every 60 seconds to prevent unbounded memory growth.

Default rate limit tiers

All limits are requests per minute (RPM) within a 60-second sliding window.

Path / Pattern	Method	Limit (RPM)	Notes
General (all authenticated users)	Any	60	Default; configurable via `RATE_LIMIT_REQUESTS_PER_MINUTE`
`/api/auth/login`	Any	100	Per-IP backstop only — primary limit is per-email (see below)
`/api/auth/register`	Any	10	Per-IP
`/api/auth/refresh`	Any	30	Per-IP
`/api/conversations/shared/`	Any	30	Per user/IP
`POST /api/conversations/{id}/messages`	POST	60	Per user (regex match)
`POST /api/admin/dlp-rules/test`	POST	10	High-cost endpoint (A2-RATE-01)
Admin users (non-exempt)	Any	600	Elevated bucket, not unlimited

Tier matching precedence (highest to lowest):

Method + regex pattern (highest specificity)
Method + exact path
Method + path prefix (longest wins)
Path-only exact match
Path-only prefix match (longest wins)
General limit (fallback)

Custom tier overrides via environment variable

You can override or extend tiers at deployment time via RATE_LIMIT_TIERS. Set it to a JSON object:

RATE_LIMIT_TIERS='{"POST /api/conversations/{id}/messages": 120, "/api/admin/": 200}'

Custom tiers are merged with (and override) the defaults. Values must be positive integers.

Login attempts at POST /api/auth/login use a per-email primary limit rather than per-IP, because enterprise customers often route hundreds of users through the same NAT/proxy IP.

How it works

The middleware reads the email (or username) field from the JSON request body.
A primary check is performed against the bucket key login:{email_lowercased} with a limit of 10 attempts per email per minute.
If the email limit is not exceeded, a secondary per-IP check is performed against the normal IP bucket with the /api/auth/login tier limit (100 RPM) as a backstop against credential-stuffing bots probing many different accounts from the same IP.

Check	Bucket key	Limit	Purpose
Primary (email)	`login:{email}`	10 RPM	Brute-force protection per account
Secondary (IP)	`ip:{address}`	100 RPM	Backstop against mass credential-stuffing

If either check fails, a 429 is returned immediately. The ASGI body is buffered and replayed so the downstream login handler receives it unchanged.

When the email field cannot be extracted (malformed JSON, etc.), the middleware falls back to per-IP limiting using the standard /api/auth/login tier.

Log events

WARN  rate_limit — Login rate limit exceeded (per-email) client_key=login:user@example.com limit=10 tier=auth_email

OAuth M2M rate limiting

OAuth M2M clients (service accounts using client credentials flow) are rate-limited separately from human users. Their bucket key is oauth:{client_id}, isolated from user and IP buckets.

Tier table

The rate_limit_tier claim is embedded in the M2M JWT at token issuance — no database lookup occurs per request.

Tier	RPM	Use case
`standard`	1,000	Default for new M2M clients
`premium`	5,000	High-volume integrations
`unlimited`	No limit	Fully exempt from rate limiting

If the JWT carries an unrecognised tier value, the middleware applies the standard default (1,000 RPM).

Unlimited tier exemption

When rate_limit_tier=unlimited, the request passes through immediately without any sliding window check. This is intended for internal platform-to-platform integrations where limiting would interfere with system operations.

OAuth endpoint 429 response format

For requests to /api/oauth/* paths, the 429 response uses RFC 6749 format:

{
  "error": "rate_limit_exceeded",
  "error_description": "Rate limit exceeded. Retry after 23 seconds."
}

For all other paths with M2M tokens, the standard format is used:

{
  "error": "rate_limit_exceeded",
  "tier": "m2m",
  "retry_after": 23
}

Enterprise custom RPM

Enterprise tier organisations can have a custom per-minute limit configured in the enterprise_entitlements table. This overrides the default tier limit for all non-auth endpoints for authenticated users in that organisation.

How the override works

For authenticated requests (user:{uuid} bucket, non-auth paths), the middleware queries enterprise_entitlements.custom_rate_limit_rpm for the user’s organisation.
The result is cached in-process for 300 seconds per user UUID to avoid a database round-trip on every request.
When a custom RPM is set, it replaces the resolved tier limit (including admin RPM, if applicable).
If no custom RPM is set (null), the standard tier limit applies.

Cache invalidation

To apply a new custom RPM immediately (without waiting 300 seconds for cache expiry), call the cache invalidation function. This is triggered automatically when updating enterprise_entitlements via the admin API.

Configuration

Set via the Platform admin panel at Admin → Enterprise → Custom Rate Limit, or directly via the API:

PATCH /api/admin/enterprise/entitlements
Content-Type: application/json

{ "custom_rate_limit_rpm": 10000 }

Admin user rate limiting

Admin users receive elevated but finite rate limits rather than a blanket exemption.

Setting	Default	Description
`RATE_LIMIT_ADMIN_EXEMPT`	`false`	Set `true` to fully exempt admins from rate limiting
`RATE_LIMIT_ADMIN_RPM` (env)	600	Admin user bucket size when not fully exempt

When RATE_LIMIT_ADMIN_EXEMPT=true, admin requests pass through the middleware without any sliding window check. This is not recommended for production — use the elevated admin RPM instead.

The admin RPM (600 by default) does not override auth endpoint tiers — login, register, and refresh endpoints always use per-IP/per-email limits regardless of the user’s role.

HTTP response headers

Every response (allowed or denied) includes these headers:

Header	Value	Description
`X-RateLimit-Limit`	Integer	The limit in effect for this request
`X-RateLimit-Remaining`	Integer	Requests remaining in current window
`X-RateLimit-Reset`	Unix timestamp	When the window resets

On a 429 response, the Retry-After header is also included:

HTTP/1.1 429 Too Many Requests
Retry-After: 23
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1741789380
Content-Type: application/json

{
  "error": "rate_limit_exceeded",
  "tier": "general",
  "retry_after": 23
}

Exempt paths

The following are always exempt from rate limiting:

Path/method	Reason
`GET /health`	Health probe — must always respond
`OPTIONS *`	CORS preflight — must not be blocked

Configuration reference

Environment variable	Default	Description
`RATE_LIMIT_REQUESTS_PER_MINUTE`	`60`	Default limit for all authenticated users
`RATE_LIMIT_ADMIN_EXEMPT`	`false`	Fully exempt admin users from rate limiting
`RATE_LIMIT_ADMIN_RPM`	`600`	Admin user RPM when not fully exempt
`RATE_LIMIT_TIERS`	`{}`	JSON object overriding or extending default per-path tiers
`REDIS_URL`	`""`	Redis connection URL. Rate limiter uses a separate client from usage counters. When unset, in-memory fallback is used.

OAuth M2M API guide — M2M client setup, token issuance, tier configuration
API reference: OAuth clients — CRUD and rate_limit_tier enum reference
Usage dashboard — Per-user request volume visualisation
Rate limiting guide (user-facing) — End-user guide to handling 429 responses