Skip to content

Usage Metering Administration

Usage metering tracks request counts, token consumption, and cost estimates per organisation across billing periods. Metering runs at two layers: a real-time counter for quota enforcement, and background aggregation rollups for reporting.


Every authenticated request increments the organisation’s counter in OrgUsageCounter. The counter is keyed to a calendar month billing period and resets automatically at the start of each new month.

A Redis hot path (sub-millisecond) handles the increment for active traffic. PostgreSQL remains the authoritative record; a background sync task writes Redis counters to Postgres every 60 seconds. If Redis is unavailable, all operations fall back to Postgres transparently.

A background aggregation scheduler materialises hourly, daily, and monthly rollups into the org_usage_rollups table from raw message-level data. The scheduler interval is controlled by USAGE_AGGREGATION_INTERVAL_MINUTES (default: 60 minutes).

Message.conversation_id
→ Conversation.user_id
→ User.tenant_id (= org_id)
→ OrgUsageRollup (hourly | daily | monthly)

Each rollup row records request_count, input_tokens, output_tokens, cost_estimate, and a model_breakdown JSONB dict keyed by model_id.


Monthly request limits are set per plan tier and locked by PO decision.

Plan tierMonthly limitNotes
devfree_saas100,000Default for free-tier organisations
devpro_saas1,000,000
team_saas1,000,000
enterprise_saas1,000,000Overridable via enterprise_entitlements.custom_request_limit
enterprise_outpost1,000,000Overridable via enterprise_entitlements.custom_request_limit

Legacy tier keys free (100,000) and paid (1,000,000) are supported for backward compatibility until migration 045 runs.

Enterprise custom limits are stored in enterprise_entitlements.custom_request_limit. When set, the custom value takes precedence over the tier default.


All endpoints require UserRole.ADMIN and are tenant-scoped via the caller’s tenant_id.

GET /api/admin/usage/summary
Authorization: Bearer <token>

Returns the current billing period’s request count, token totals, cost estimate, plan limit, and warning level.

Response

{
"org_id": "b3e2a1c0-...",
"plan_tier": "enterprise_saas",
"request_count": 834200,
"input_tokens": 12400000,
"output_tokens": 3100000,
"cost_estimate": 47.83,
"limit": 1000000,
"period_start": "2026-03-01",
"period_end": "2026-03-31",
"percentage_used": 83.4,
"warning_level": "warning_80"
}
FieldTypeDescription
plan_tierstringCanonical tier key
request_countintRequests in the current billing period
input_tokensintTotal input tokens (from monthly rollup)
output_tokensintTotal output tokens (from monthly rollup)
cost_estimatefloatEstimated USD cost for the period
limitintMonthly request limit
percentage_usedfloatRequests as percentage of limit (0–100+)
warning_levelstring"none", "warning_80", or "warning_95"
GET /api/admin/usage/history
?granularity=daily
&limit=30
&cursor=2026-03-10T00:00:00+00:00
Authorization: Bearer <token>

Returns cursor-paginated rollup history. Ordered newest-first (period_start DESC).

Query parameters

ParameterDefaultDescription
granularitydailyhourly, daily, or monthly
limit30Items per page (1–90)
cursorISO 8601 period_start of the oldest item on the previous page

Response

{
"items": [
{
"period_start": "2026-03-11T00:00:00+00:00",
"period_end": "2026-03-12T00:00:00+00:00",
"request_count": 38400,
"input_tokens": 560000,
"output_tokens": 140000,
"cost_estimate": 2.18
}
],
"next_cursor": "2026-03-10T00:00:00+00:00",
"total": 1
}

Pass next_cursor from a response as cursor on the next request to page backward through history.

GET /api/admin/usage/by-model
?granularity=monthly
&period_start=2026-03-01
&period_end=2026-03-31
Authorization: Bearer <token>

Aggregates the model_breakdown JSONB from rollup rows for the requested period and returns one item per unique model.

Query parameters

ParameterDefaultDescription
granularitymonthlyhourly, daily, or monthly
period_startInclusive start date (YYYY-MM-DD)
period_endInclusive end date (YYYY-MM-DD)

Response

{
"items": [
{
"model_id": "claude-sonnet-4-6",
"provider": "anthropic",
"request_count": 450000,
"input_tokens": 6800000,
"output_tokens": 1700000,
"cost_estimate": 29.40
},
{
"model_id": "gpt-4o",
"provider": "openai",
"request_count": 384200,
"input_tokens": 5600000,
"output_tokens": 1400000,
"cost_estimate": 18.43
}
]
}

Items are sorted alphabetically by model_id.

GET /api/admin/usage/alerts?limit=20&offset=0
Authorization: Bearer <token>

Returns paginated alert records for the caller’s organisation, ordered newest-first by triggered_at.

Query parameters

ParameterDefaultDescription
limit20Records per page (1–100)
offset0Records to skip

Response

{
"items": [
{
"id": "a4f2c1e0-...",
"org_id": "b3e2a1c0-...",
"alert_type": "warning_80",
"message": "Usage at 80% threshold: 834,200 / 1,000,000 requests (83.4%)",
"created_at": "2026-03-10T14:22:00Z"
}
],
"total": 3
}
GET /api/orgs/{org_id}/usage
Authorization: Bearer <token>

Accessible to platform admins (any org_id) and org admins (own tenant_id only). Returns the same fields as the summary endpoint.


UsageThrottleMiddleware reads the organisation’s current usage percentage on each authenticated, non-M2M HTTP request and applies one of three tier actions.

Threshold (env var)DefaultEffect
USAGE_THROTTLE_WARN80%X-Usage-Warning: threshold=80,percentage=<pct> header added to responses
USAGE_THROTTLE_SLOW95%Above header + X-Usage-Throttle: active — rate limit middleware applies 50% RPM reduction
USAGE_THROTTLE_BLOCK100%HTTP 429 returned immediately

Usage percentages are cached per org. Configure the TTL with USAGE_THROTTLE_CACHE_TTL (default: 300 seconds).

Bypasses:

  • /health path — always exempt
  • Unauthenticated requests — no org to check
  • M2M tokens (token_type=m2m JWT claim) — M2M clients have separate quota enforcement

Fail-open: if the DB/cache lookup fails for any reason the request is passed through without blocking.

When percentage_used >= 100:

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: <seconds until end of billing period>
{
"detail": "Monthly usage limit reached",
"upgrade_url": "/billing/upgrade",
"retry_after_seconds": 1728000
}

retry_after_seconds is calculated as the number of seconds from now until UTC midnight on the first day of next month.


UsageAlertService fires an alert record when an organisation crosses a configured threshold percentage for the first time in a billing period.

Default thresholds: 50%, 80%, 95%, 100%

Override via environment variable (comma-separated integers):

USAGE_ALERT_THRESHOLDS=50,80,95,100

De-duplication: At most one alert fires per (org_id, threshold_pct) per billing period. The period scope is determined by period_start — a new billing month resets the de-duplication state.

Each alert stores:

FieldDescription
threshold_pctThreshold that was crossed (50/80/95/100)
current_pctActual usage percentage at trigger time
request_countRequest count at trigger time
limitPlan limit at trigger time
triggered_atUTC timestamp of the event
webhook_deliveredWhether the webhook callback succeeded
webhook_errorLast delivery error (if failed)

For each fired alert, the service looks for enabled webhooks on the organisation that subscribe to the usage.threshold event. If found, the following payload is delivered:

{
"event": "usage.threshold",
"org_id": "b3e2a1c0-...",
"threshold_pct": 80,
"current_pct": 83.4,
"request_count": 834200,
"limit": 1000000,
"triggered_at": "2026-03-10T14:22:00+00:00"
}

Webhook delivery failure does not suppress the alert record — the row is always persisted. Retry is not automatic; check webhook_error for delivery failure details.

Configure webhooks at Settings → Webhooks in the admin portal or via the webhooks API.


Aggregation runs on a background asyncio loop. Configure with:

Env varDefaultDescription
USAGE_AGGREGATION_INTERVAL_MINUTES60Minutes between aggregation runs

The scheduler fires for the previous completed period to avoid aggregating partial windows. Aggregation is idempotent — re-running for the same (org_id, period_type, period_start) performs an upsert.

The scheduler also fires usage threshold alerts after each successful aggregation run. Alert checks use the monthly plan limit for the percentage calculation.