Usage Metering Administration

Usage metering tracks request counts, token consumption, and cost estimates per organisation across billing periods. Metering runs at two layers: a real-time counter for quota enforcement, and background aggregation rollups for reporting.

How metering works

Every authenticated request increments the organisation’s counter in OrgUsageCounter. The counter is keyed to a calendar month billing period and resets automatically at the start of each new month.

A Redis hot path (sub-millisecond) handles the increment for active traffic. PostgreSQL remains the authoritative record; a background sync task writes Redis counters to Postgres every 60 seconds. If Redis is unavailable, all operations fall back to Postgres transparently.

A background aggregation scheduler materialises hourly, daily, and monthly rollups into the org_usage_rollups table from raw message-level data. The scheduler interval is controlled by USAGE_AGGREGATION_INTERVAL_MINUTES (default: 60 minutes).

Aggregation chain

Message.conversation_id
  → Conversation.user_id
    → User.tenant_id (= org_id)
      → OrgUsageRollup (hourly | daily | monthly)

Each rollup row records request_count, input_tokens, output_tokens, cost_estimate, and a model_breakdown JSONB dict keyed by model_id.

Plan tier limits

Monthly request limits are set per plan tier and locked by PO decision.

Plan tier	Monthly limit	Notes
`devfree_saas`	100,000	Default for free-tier organisations
`devpro_saas`	1,000,000
`team_saas`	1,000,000
`enterprise_saas`	1,000,000	Overridable via `enterprise_entitlements.custom_request_limit`
`enterprise_outpost`	1,000,000	Overridable via `enterprise_entitlements.custom_request_limit`

Legacy tier keys free (100,000) and paid (1,000,000) are supported for backward compatibility until migration 045 runs.

Enterprise custom limits are stored in enterprise_entitlements.custom_request_limit. When set, the custom value takes precedence over the tier default.

Admin API endpoints

All endpoints require UserRole.ADMIN and are tenant-scoped via the caller’s tenant_id.

Current-period summary

GET /api/admin/usage/summary
Authorization: Bearer <token>

Returns the current billing period’s request count, token totals, cost estimate, plan limit, and warning level.

Response

{
  "org_id": "b3e2a1c0-...",
  "plan_tier": "enterprise_saas",
  "request_count": 834200,
  "input_tokens": 12400000,
  "output_tokens": 3100000,
  "cost_estimate": 47.83,
  "limit": 1000000,
  "period_start": "2026-03-01",
  "period_end": "2026-03-31",
  "percentage_used": 83.4,
  "warning_level": "warning_80"
}

Field	Type	Description
`plan_tier`	string	Canonical tier key
`request_count`	int	Requests in the current billing period
`input_tokens`	int	Total input tokens (from monthly rollup)
`output_tokens`	int	Total output tokens (from monthly rollup)
`cost_estimate`	float	Estimated USD cost for the period
`limit`	int	Monthly request limit
`percentage_used`	float	Requests as percentage of limit (0–100+)
`warning_level`	string	`"none"`, `"warning_80"`, or `"warning_95"`

Usage history

GET /api/admin/usage/history
  ?granularity=daily
  &limit=30
  &cursor=2026-03-10T00:00:00+00:00
Authorization: Bearer <token>

Returns cursor-paginated rollup history. Ordered newest-first (period_start DESC).

Query parameters

Parameter	Default	Description
`granularity`	`daily`	`hourly`, `daily`, or `monthly`
`limit`	30	Items per page (1–90)
`cursor`	—	ISO 8601 `period_start` of the oldest item on the previous page

Response

{
  "items": [
    {
      "period_start": "2026-03-11T00:00:00+00:00",
      "period_end": "2026-03-12T00:00:00+00:00",
      "request_count": 38400,
      "input_tokens": 560000,
      "output_tokens": 140000,
      "cost_estimate": 2.18
    }
  ],
  "next_cursor": "2026-03-10T00:00:00+00:00",
  "total": 1
}

Pass next_cursor from a response as cursor on the next request to page backward through history.

Per-model breakdown

GET /api/admin/usage/by-model
  ?granularity=monthly
  &period_start=2026-03-01
  &period_end=2026-03-31
Authorization: Bearer <token>

Aggregates the model_breakdown JSONB from rollup rows for the requested period and returns one item per unique model.

Query parameters

Parameter	Default	Description
`granularity`	`monthly`	`hourly`, `daily`, or `monthly`
`period_start`	—	Inclusive start date (YYYY-MM-DD)
`period_end`	—	Inclusive end date (YYYY-MM-DD)

Response

{
  "items": [
    {
      "model_id": "claude-sonnet-4-6",
      "provider": "anthropic",
      "request_count": 450000,
      "input_tokens": 6800000,
      "output_tokens": 1700000,
      "cost_estimate": 29.40
    },
    {
      "model_id": "gpt-4o",
      "provider": "openai",
      "request_count": 384200,
      "input_tokens": 5600000,
      "output_tokens": 1400000,
      "cost_estimate": 18.43
    }
  ]
}

Items are sorted alphabetically by model_id.

Usage alerts

GET /api/admin/usage/alerts?limit=20&offset=0
Authorization: Bearer <token>

Returns paginated alert records for the caller’s organisation, ordered newest-first by triggered_at.

Query parameters

Parameter	Default	Description
`limit`	20	Records per page (1–100)
`offset`	0	Records to skip

Response

{
  "items": [
    {
      "id": "a4f2c1e0-...",
      "org_id": "b3e2a1c0-...",
      "alert_type": "warning_80",
      "message": "Usage at 80% threshold: 834,200 / 1,000,000 requests (83.4%)",
      "created_at": "2026-03-10T14:22:00Z"
    }
  ],
  "total": 3
}

Org-level usage (org admins)

GET /api/orgs/{org_id}/usage
Authorization: Bearer <token>

Accessible to platform admins (any org_id) and org admins (own tenant_id only). Returns the same fields as the summary endpoint.

Throttle tier enforcement

UsageThrottleMiddleware reads the organisation’s current usage percentage on each authenticated, non-M2M HTTP request and applies one of three tier actions.

Threshold (env var)	Default	Effect
`USAGE_THROTTLE_WARN`	80%	`X-Usage-Warning: threshold=80,percentage=<pct>` header added to responses
`USAGE_THROTTLE_SLOW`	95%	Above header + `X-Usage-Throttle: active` — rate limit middleware applies 50% RPM reduction
`USAGE_THROTTLE_BLOCK`	100%	HTTP 429 returned immediately

Usage percentages are cached per org. Configure the TTL with USAGE_THROTTLE_CACHE_TTL (default: 300 seconds).

Bypasses:

/health path — always exempt
Unauthenticated requests — no org to check
M2M tokens (token_type=m2m JWT claim) — M2M clients have separate quota enforcement

Fail-open: if the DB/cache lookup fails for any reason the request is passed through without blocking.

HTTP 429 response

When percentage_used >= 100:

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: <seconds until end of billing period>

{
  "detail": "Monthly usage limit reached",
  "upgrade_url": "/billing/upgrade",
  "retry_after_seconds": 1728000
}

retry_after_seconds is calculated as the number of seconds from now until UTC midnight on the first day of next month.

Alert thresholds

UsageAlertService fires an alert record when an organisation crosses a configured threshold percentage for the first time in a billing period.

Default thresholds: 50%, 80%, 95%, 100%

Override via environment variable (comma-separated integers):

USAGE_ALERT_THRESHOLDS=50,80,95,100

De-duplication: At most one alert fires per (org_id, threshold_pct) per billing period. The period scope is determined by period_start — a new billing month resets the de-duplication state.

Alert records

Each alert stores:

Field	Description
`threshold_pct`	Threshold that was crossed (50/80/95/100)
`current_pct`	Actual usage percentage at trigger time
`request_count`	Request count at trigger time
`limit`	Plan limit at trigger time
`triggered_at`	UTC timestamp of the event
`webhook_delivered`	Whether the webhook callback succeeded
`webhook_error`	Last delivery error (if failed)

Webhook delivery

For each fired alert, the service looks for enabled webhooks on the organisation that subscribe to the usage.threshold event. If found, the following payload is delivered:

{
  "event": "usage.threshold",
  "org_id": "b3e2a1c0-...",
  "threshold_pct": 80,
  "current_pct": 83.4,
  "request_count": 834200,
  "limit": 1000000,
  "triggered_at": "2026-03-10T14:22:00+00:00"
}

Webhook delivery failure does not suppress the alert record — the row is always persisted. Retry is not automatic; check webhook_error for delivery failure details.

Configure webhooks at Settings → Webhooks in the admin portal or via the webhooks API.

Background aggregation scheduler

Aggregation runs on a background asyncio loop. Configure with:

Env var	Default	Description
`USAGE_AGGREGATION_INTERVAL_MINUTES`	60	Minutes between aggregation runs

The scheduler fires for the previous completed period to avoid aggregating partial windows. Aggregation is idempotent — re-running for the same (org_id, period_type, period_start) performs an upsert.

The scheduler also fires usage threshold alerts after each successful aggregation run. Alert checks use the monthly plan limit for the percentage calculation.

Usage Metering Administration

How metering works

Aggregation chain

Plan tier limits

Admin API endpoints

Current-period summary

Usage history

Per-model breakdown

Usage alerts

Org-level usage (org admins)

Throttle tier enforcement

HTTP 429 response

Alert thresholds

Alert records

Webhook delivery

Background aggregation scheduler

See also