Request Lifecycle
Every AI request that passes through the Arbitex gateway traverses a fixed, ordered pipeline before reaching an AI provider. The pipeline is the single enforcement point for authentication, quota, policy, data loss prevention, routing, and compliance logging. No request bypasses any stage. This page describes each stage, what component is responsible for it, and how failures surface.
Pipeline Overview
Section titled “Pipeline Overview”The pipeline has seven stages. Stages 1 through 3 run before the provider call. Stages 5 through 7 run after it. Stage 4 (routing) is the decision point that determines which provider path the request takes.
flowchart TD A([Client]) -->|Authorization: Bearer| B[Stage 1 — Intake\nAPI Gateway] B -->|401 invalid token\n429 rate limit| ERR1([Error]) B -->|tenant_id extracted| C[Stage 2 — Policy Evaluation\nPolicy Engine] C -->|BLOCK| ERR2([403 — policy reason]) C -->|ALLOW / REDACT / ROUTE_TO| D[Stage 3 — DLP Scan — Input\n3-Tier Pipeline] D -->|CANCEL| ERR3([403 / 400 — DLP block]) D -->|REDACT — prompt sanitised| E[Stage 4 — Routing\nDeployment Mode] D -->|ALLOW — prompt unchanged| E E -->|SaaS| F1[Stage 5 — Provider Call\nSaaS direct to provider API] E -->|Hybrid Outpost| F2[Stage 5 — Provider Call\nOutpost via mTLS] F1 -->|502 / 503 provider error| ERR4([Error]) F2 -->|502 / 503 provider error| ERR4 F1 --> G[Stage 6 — Output Scan\n3-Tier Pipeline] F2 --> G G -->|BLOCK — output suppressed| ERR5([output_blocked SSE event]) G -->|REDACT — inline replacement| H[Stage 7 — Audit Write\nHMAC-chained log entry] G -->|ALLOW| H H --> Z([Response to client])Stage-by-Stage Breakdown
Section titled “Stage-by-Stage Breakdown”Stage 1 — Intake
Section titled “Stage 1 — Intake”Component: API Gateway (backend.app.api.chat, FastAPI dependencies)
The gateway receives the HTTP request and performs three checks before any application logic runs:
- Authentication — Validates the
Authorization: Bearertoken. Accepted token types are JWT (session tokens issued by the platform) and API keys (long-lived tokens issued to service accounts). The JWT blacklist bloom filter is checked on every request; if the filter has not completed its first database sync at startup, the gateway returns 503 rather than risk processing a revoked token. - Tenant extraction — Resolves
tenant_idfrom the authenticated token. All downstream enforcement is scoped to this tenant. - Rate limiting — Per-org and per-user rate limits are evaluated. Quota checks (plan tier request count, per-user/group quotas, and dollar budget caps) run in parallel with DLP during Stage 2 and can short-circuit with a 429 before the DLP scan completes.
Failure surface: 401 Unauthorized (invalid or missing token), 429 Too Many Requests (rate or quota exceeded), 503 Service Unavailable (blacklist filter not ready).
Stage 2 — Policy Evaluation
Section titled “Stage 2 — Policy Evaluation”Component: Policy Engine (backend.app.services.policy_engine.evaluate_policy_chain)
The Policy Engine evaluates the request against the active policy packs for the tenant. Policy evaluation is the single enforcement point; it replaces all per-endpoint scattered checks that existed before the Epic J2 refactor.
Policy packs contain rules of three types:
| Rule type | What it evaluates |
|---|---|
PROMPT | Content-based analysis of the prompt text. Governance challenge flow applies: API callers receive HTTP 449 with a challenge payload; interactive (SSE) callers receive a governance_prompt SSE event. |
channel | Restricts request origin by user or group membership. Used to enforce model access policies per team. |
intent_complexity | Evaluated after intent classification when auto_route is enabled. |
Possible decisions:
- ALLOW — proceed to DLP scan.
- BLOCK / CANCEL — request terminated; audit entry written.
- REDACT — policy-specified redactions applied to the prompt before DLP.
- ROUTE_TO — allow, but override the target model and provider. ROUTE_TO takes precedence over both
auto_routeintent classification and the model specified by the client.
Failure surface: 403 Forbidden (policy block), 449 (governance challenge requiring user justification).
Stage 3 — DLP Scan (Input)
Section titled “Stage 3 — DLP Scan (Input)”Component: 3-Tier DLP Pipeline (backend.app.services.intake_pipeline, DLP microservice)
The DLP pipeline inspects the prompt text for sensitive data before it is forwarded to the provider. It runs as three sequential tiers, each only executing if the previous tier did not terminate the request:
| Tier | Method | When active |
|---|---|---|
| Tier 1 | Regex and pattern detectors — PII patterns, credential formats, keyword lists | Always. Fast path; runs synchronously in the gateway process. |
| Tier 2 | ML NER classifier (DeBERTa-based) — confidence-scored entity detection | Runs if Tier 1 does not terminate. Requires GPU pool availability. |
| Tier 3 | LLM-based contextual analysis | Hybrid Outpost deployments only (tier3_active flag in Outpost heartbeat). |
Possible DLP actions:
- ALLOW — prompt passes clean.
- REDACT — detected entities replaced inline. The sanitised prompt is passed to the provider; original text is preserved in the audit record.
- CANCEL / BLOCK — request terminated. Failure behavior is configurable per detector: block (default) or pass-through.
Failure surface: If the GPU pool is unavailable (parked or restarting), inference requests fail closed — the request is blocked rather than passing through without scanning. 403 or 400 returned depending on the rule configuration.
Stage 4 — Routing
Section titled “Stage 4 — Routing”Component: Chat handler (backend.app.api.chat.send_message), intent auto-router (backend.app.services.intent_auto_route)
After the prompt clears policy and DLP, the routing stage determines which provider path the request takes:
- SaaS path — Request is forwarded directly from the gateway to the AI provider API over HTTPS. Used when the org’s deployment mode is SaaS.
- Hybrid Outpost path — Request is forwarded to a customer-deployed Outpost container over mTLS. The Outpost authenticates to the gateway using a client certificate issued at registration time. NGINX enforces
ssl_verify_client onfor all/outpost/*paths.
Model selection priority (highest to lowest):
ROUTE_TOaction from policy evaluation- Intent-based
auto_routeclassification (ifauto_route: truein the request) - Model specified by the client in
body.models[0] - Fallback model from
fallback_model/fallback_providerrequest fields or DB-configured fallback chain
Outpost offline behavior: If the Cloud control plane is unreachable, the Outpost continues enforcing the last-cached policy pack. Policy version is confirmed on every 120-second heartbeat.
Stage 5 — Provider Call
Section titled “Stage 5 — Provider Call”Component: Provider adapters (backend.app.services.chat, backend.app.providers.registry)
The (possibly redacted) prompt is forwarded to the selected AI provider. The gateway supports:
- OpenAI
- Anthropic
- Azure OpenAI
- Amazon Bedrock
- Google (Gemini)
- Mistral
- Groq
- Cohere
The provider call supports SSE streaming. The gateway begins yielding SSE content chunks to the client as soon as the provider starts responding. Token counts, model ID, and provider name are captured from the provider response for use in the audit record.
Failure surface: 502 Bad Gateway (provider returned an error), 503 Service Unavailable (provider unreachable or timeout).
Stage 6 — Output Scan
Section titled “Stage 6 — Output Scan”Component: Output DLP scanner (backend.app.services.intake_pipeline.finalize_pipeline, run_output_regex_scan)
The full provider response is scanned by the same 3-tier DLP pipeline before delivery to the client. The output scan has two modes:
- Streaming scan (J3 mode) — When
policy_output_streaming_scanis enabled andpolicy_output_buffer_ms > 0, Tier 1 (regex) runs on each accumulated buffer window as chunks arrive. Detected entities are redacted before the chunk is emitted. After the stream ends, the full 3-tier scan runs on the complete response. Any additional entities caught by the full scan that were missed during streaming are emitted asdlp_correctionSSE events. - Buffer-all mode (J2 compatibility) — The full response is buffered, then scanned in one pass before delivery.
Possible output actions:
- ALLOW — response delivered as-is.
- REDACT — entities replaced inline before delivery to client. Originals preserved in audit.
- BLOCK — entire response suppressed. Client receives an
output_blockedSSE event. The blocked response text is retained in the audit record.
Failure surface: output_blocked SSE event (output DLP block). No HTTP error code — the stream has already started at this point.
Stage 7 — Audit Write
Section titled “Stage 7 — Audit Write”Component: Audit service (backend.app.services.audit.log_event)
An HMAC-chained audit log entry is written asynchronously after the provider response is finalized. The audit record includes:
| Field | Content |
|---|---|
user_id | UUID of the requesting user |
action | Pipeline action taken (ALLOW, REDACT, CANCEL, BLOCK, ROUTE_TO) |
model_id | Model that handled the request |
provider | Provider that handled the request |
prompt_text | Original prompt (configurable — can be omitted for privacy) |
response_text | Provider response text |
token_count_input | Input tokens consumed |
token_count_output | Output tokens consumed |
cost_estimate | Estimated cost in USD |
latency_ms | End-to-end provider latency |
dlp_results | Entities detected by input and output DLP scans |
previous_hmac | HMAC of the previous entry (chain integrity) |
The HMAC is computed over the event content concatenated with the previous entry’s HMAC. This makes the audit chain tamper-evident: any modification to a historical entry invalidates all subsequent HMACs.
A partial audit entry (status received) is written at Stage 1 before any processing occurs. This pre-log satisfies compliance requirements that mandate logging at intake, independent of whether the request later succeeds or fails. The entry is updated with full results after Stage 6 completes.
Failure surface: Audit write failures are logged and monitored but do not cause the response to fail. The client receives the response regardless of audit write status.
SaaS vs Hybrid Outpost Paths
Section titled “SaaS vs Hybrid Outpost Paths”Both deployment modes share the same Stage 1–3 and Stage 6–7 pipeline. The difference is in Stages 4–5.
Client → api.arbitex.ai/v1 → Gateway (AKS) → Provider API- Policy evaluation runs in the Arbitex-managed AKS cluster.
- DLP scanning runs on the GPU node pool within the same cluster.
- The gateway forwards the request directly to the provider over HTTPS using provider API keys managed by Arbitex.
- All audit records are written to the Arbitex-managed database.
Hybrid Outpost
Section titled “Hybrid Outpost”Client → Outpost (customer environment) → api.arbitex.ai/outpost → Gateway relay → Provider API- The Outpost container runs in the customer’s environment (on-premise, private cloud, or VPC).
- DLP scanning (Tiers 1–3, including Tier 3 LLM contextual analysis) runs inside the Outpost.
- The Outpost pulls its policy pack from the Cloud control plane at startup and on a periodic interval. Each pull is confirmed via a 120-second heartbeat that reports
policy_version. - If the Cloud is unreachable, the Outpost continues enforcing the last-cached policy pack. Requests are not dropped — they are evaluated against the cached policy.
- mTLS is enforced on all Outpost-to-Cloud traffic. The Outpost presents a client certificate issued by the Cloud CA at registration time. NGINX verifies the full certificate chain.
- Tier 3 DLP availability is reported in the Outpost heartbeat (
tier3_active). If the model serving Tier 3 is unavailable, the pipeline degrades to Tier 1+2 only.
Failure Modes Summary
Section titled “Failure Modes Summary”| Stage | Component | Failure | Surface |
|---|---|---|---|
| 1 — Intake | API Gateway | Invalid or missing Bearer token | 401 Unauthorized |
| 1 — Intake | API Gateway | JWT blacklist filter not ready at startup | 503 Service Unavailable |
| 1 — Intake | Quota service | Plan tier, per-user/group quota, or budget cap exceeded | 429 Too Many Requests |
| 2 — Policy evaluation | Policy Engine | Policy rule BLOCK decision | 403 Forbidden |
| 2 — Policy evaluation | Policy Engine | PROMPT governance challenge | 449 (API) or governance_prompt SSE event (interactive) |
| 3 — DLP scan (input) | DLP pipeline | Detector action: CANCEL / BLOCK | 403 or 400 (configurable per detector) |
| 3 — DLP scan (input) | GPU node pool | Inference unavailable (pool parked/restarting) | Request fails closed — 503 or DLP block |
| 5 — Provider call | Provider adapter | Provider API error | 502 Bad Gateway |
| 5 — Provider call | Provider adapter | Provider unreachable or timeout | 503 Service Unavailable |
| 6 — Output scan | Output DLP | Response blocked | output_blocked SSE event (stream already open) |
| 7 — Audit write | Audit service | Write failure | Non-fatal — logged, monitored; response still delivered |
See also
Section titled “See also”- SaaS Infrastructure Architecture — AKS layers, GPU pool, Cloudflare edge, and network topology
- Policy Engine — Rule types, policy packs, and enforcement actions
- DLP Deep Dive — 3-tier pipeline configuration, detector tuning, and failure modes
- Hybrid Outpost deployment — Outpost setup, mTLS registration, and policy sync
- Audit Logs — Searching and exporting the HMAC-chained audit trail