Usage Metering Administration
Usage metering tracks request counts, token consumption, and cost estimates per organisation across billing periods. Metering runs at two layers: a real-time counter for quota enforcement, and background aggregation rollups for reporting.
How metering works
Section titled “How metering works”Every authenticated request increments the organisation’s counter in OrgUsageCounter. The counter is keyed to a calendar month billing period and resets automatically at the start of each new month.
A Redis hot path (sub-millisecond) handles the increment for active traffic. PostgreSQL remains the authoritative record; a background sync task writes Redis counters to Postgres every 60 seconds. If Redis is unavailable, all operations fall back to Postgres transparently.
A background aggregation scheduler materialises hourly, daily, and monthly rollups into the org_usage_rollups table from raw message-level data. The scheduler interval is controlled by USAGE_AGGREGATION_INTERVAL_MINUTES (default: 60 minutes).
Aggregation chain
Section titled “Aggregation chain”Message.conversation_id → Conversation.user_id → User.tenant_id (= org_id) → OrgUsageRollup (hourly | daily | monthly)Each rollup row records request_count, input_tokens, output_tokens, cost_estimate, and a model_breakdown JSONB dict keyed by model_id.
Plan tier limits
Section titled “Plan tier limits”Monthly request limits are set per plan tier and locked by PO decision.
| Plan tier | Monthly limit | Notes |
|---|---|---|
devfree_saas | 100,000 | Default for free-tier organisations |
devpro_saas | 1,000,000 | |
team_saas | 1,000,000 | |
enterprise_saas | 1,000,000 | Overridable via enterprise_entitlements.custom_request_limit |
enterprise_outpost | 1,000,000 | Overridable via enterprise_entitlements.custom_request_limit |
Legacy tier keys free (100,000) and paid (1,000,000) are supported for backward compatibility until migration 045 runs.
Enterprise custom limits are stored in enterprise_entitlements.custom_request_limit. When set, the custom value takes precedence over the tier default.
Admin API endpoints
Section titled “Admin API endpoints”All endpoints require UserRole.ADMIN and are tenant-scoped via the caller’s tenant_id.
Current-period summary
Section titled “Current-period summary”GET /api/admin/usage/summaryAuthorization: Bearer <token>Returns the current billing period’s request count, token totals, cost estimate, plan limit, and warning level.
Response
{ "org_id": "b3e2a1c0-...", "plan_tier": "enterprise_saas", "request_count": 834200, "input_tokens": 12400000, "output_tokens": 3100000, "cost_estimate": 47.83, "limit": 1000000, "period_start": "2026-03-01", "period_end": "2026-03-31", "percentage_used": 83.4, "warning_level": "warning_80"}| Field | Type | Description |
|---|---|---|
plan_tier | string | Canonical tier key |
request_count | int | Requests in the current billing period |
input_tokens | int | Total input tokens (from monthly rollup) |
output_tokens | int | Total output tokens (from monthly rollup) |
cost_estimate | float | Estimated USD cost for the period |
limit | int | Monthly request limit |
percentage_used | float | Requests as percentage of limit (0–100+) |
warning_level | string | "none", "warning_80", or "warning_95" |
Usage history
Section titled “Usage history”GET /api/admin/usage/history ?granularity=daily &limit=30 &cursor=2026-03-10T00:00:00+00:00Authorization: Bearer <token>Returns cursor-paginated rollup history. Ordered newest-first (period_start DESC).
Query parameters
| Parameter | Default | Description |
|---|---|---|
granularity | daily | hourly, daily, or monthly |
limit | 30 | Items per page (1–90) |
cursor | — | ISO 8601 period_start of the oldest item on the previous page |
Response
{ "items": [ { "period_start": "2026-03-11T00:00:00+00:00", "period_end": "2026-03-12T00:00:00+00:00", "request_count": 38400, "input_tokens": 560000, "output_tokens": 140000, "cost_estimate": 2.18 } ], "next_cursor": "2026-03-10T00:00:00+00:00", "total": 1}Pass next_cursor from a response as cursor on the next request to page backward through history.
Per-model breakdown
Section titled “Per-model breakdown”GET /api/admin/usage/by-model ?granularity=monthly &period_start=2026-03-01 &period_end=2026-03-31Authorization: Bearer <token>Aggregates the model_breakdown JSONB from rollup rows for the requested period and returns one item per unique model.
Query parameters
| Parameter | Default | Description |
|---|---|---|
granularity | monthly | hourly, daily, or monthly |
period_start | — | Inclusive start date (YYYY-MM-DD) |
period_end | — | Inclusive end date (YYYY-MM-DD) |
Response
{ "items": [ { "model_id": "claude-sonnet-4-6", "provider": "anthropic", "request_count": 450000, "input_tokens": 6800000, "output_tokens": 1700000, "cost_estimate": 29.40 }, { "model_id": "gpt-4o", "provider": "openai", "request_count": 384200, "input_tokens": 5600000, "output_tokens": 1400000, "cost_estimate": 18.43 } ]}Items are sorted alphabetically by model_id.
Usage alerts
Section titled “Usage alerts”GET /api/admin/usage/alerts?limit=20&offset=0Authorization: Bearer <token>Returns paginated alert records for the caller’s organisation, ordered newest-first by triggered_at.
Query parameters
| Parameter | Default | Description |
|---|---|---|
limit | 20 | Records per page (1–100) |
offset | 0 | Records to skip |
Response
{ "items": [ { "id": "a4f2c1e0-...", "org_id": "b3e2a1c0-...", "alert_type": "warning_80", "message": "Usage at 80% threshold: 834,200 / 1,000,000 requests (83.4%)", "created_at": "2026-03-10T14:22:00Z" } ], "total": 3}Org-level usage (org admins)
Section titled “Org-level usage (org admins)”GET /api/orgs/{org_id}/usageAuthorization: Bearer <token>Accessible to platform admins (any org_id) and org admins (own tenant_id only). Returns the same fields as the summary endpoint.
Throttle tier enforcement
Section titled “Throttle tier enforcement”UsageThrottleMiddleware reads the organisation’s current usage percentage on each authenticated, non-M2M HTTP request and applies one of three tier actions.
| Threshold (env var) | Default | Effect |
|---|---|---|
USAGE_THROTTLE_WARN | 80% | X-Usage-Warning: threshold=80,percentage=<pct> header added to responses |
USAGE_THROTTLE_SLOW | 95% | Above header + X-Usage-Throttle: active — rate limit middleware applies 50% RPM reduction |
USAGE_THROTTLE_BLOCK | 100% | HTTP 429 returned immediately |
Usage percentages are cached per org. Configure the TTL with USAGE_THROTTLE_CACHE_TTL (default: 300 seconds).
Bypasses:
/healthpath — always exempt- Unauthenticated requests — no org to check
- M2M tokens (
token_type=m2mJWT claim) — M2M clients have separate quota enforcement
Fail-open: if the DB/cache lookup fails for any reason the request is passed through without blocking.
HTTP 429 response
Section titled “HTTP 429 response”When percentage_used >= 100:
HTTP/1.1 429 Too Many RequestsContent-Type: application/jsonRetry-After: <seconds until end of billing period>
{ "detail": "Monthly usage limit reached", "upgrade_url": "/billing/upgrade", "retry_after_seconds": 1728000}retry_after_seconds is calculated as the number of seconds from now until UTC midnight on the first day of next month.
Alert thresholds
Section titled “Alert thresholds”UsageAlertService fires an alert record when an organisation crosses a configured threshold percentage for the first time in a billing period.
Default thresholds: 50%, 80%, 95%, 100%
Override via environment variable (comma-separated integers):
USAGE_ALERT_THRESHOLDS=50,80,95,100De-duplication: At most one alert fires per (org_id, threshold_pct) per billing period. The period scope is determined by period_start — a new billing month resets the de-duplication state.
Alert records
Section titled “Alert records”Each alert stores:
| Field | Description |
|---|---|
threshold_pct | Threshold that was crossed (50/80/95/100) |
current_pct | Actual usage percentage at trigger time |
request_count | Request count at trigger time |
limit | Plan limit at trigger time |
triggered_at | UTC timestamp of the event |
webhook_delivered | Whether the webhook callback succeeded |
webhook_error | Last delivery error (if failed) |
Webhook delivery
Section titled “Webhook delivery”For each fired alert, the service looks for enabled webhooks on the organisation that subscribe to the usage.threshold event. If found, the following payload is delivered:
{ "event": "usage.threshold", "org_id": "b3e2a1c0-...", "threshold_pct": 80, "current_pct": 83.4, "request_count": 834200, "limit": 1000000, "triggered_at": "2026-03-10T14:22:00+00:00"}Webhook delivery failure does not suppress the alert record — the row is always persisted. Retry is not automatic; check webhook_error for delivery failure details.
Configure webhooks at Settings → Webhooks in the admin portal or via the webhooks API.
Background aggregation scheduler
Section titled “Background aggregation scheduler”Aggregation runs on a background asyncio loop. Configure with:
| Env var | Default | Description |
|---|---|---|
USAGE_AGGREGATION_INTERVAL_MINUTES | 60 | Minutes between aggregation runs |
The scheduler fires for the previous completed period to avoid aggregating partial windows. Aggregation is idempotent — re-running for the same (org_id, period_type, period_start) performs an upsert.
The scheduler also fires usage threshold alerts after each successful aggregation run. Alert checks use the monthly plan limit for the percentage calculation.
See also
Section titled “See also”- Usage Dashboard — visual overview of usage metrics in the admin portal
- Billing and metering admin guide — plan configuration and billing setup
- Rate limiting guide — per-request rate limits and RPM tiers
- Webhooks guide — configuring usage threshold webhooks