Request Lifecycle

Every AI request that passes through the Arbitex gateway traverses a fixed, ordered pipeline before reaching an AI provider. The pipeline is the single enforcement point for authentication, quota, policy, data loss prevention, routing, and compliance logging. No request bypasses any stage. This page describes each stage, what component is responsible for it, and how failures surface.

Pipeline Overview

The pipeline has seven stages. Stages 1 through 3 run before the provider call. Stages 5 through 7 run after it. Stage 4 (routing) is the decision point that determines which provider path the request takes.

flowchart TD
    A([Client]) -->|Authorization: Bearer| B[Stage 1 — Intake\nAPI Gateway]
    B -->|401 invalid token\n429 rate limit| ERR1([Error])
    B -->|tenant_id extracted| C[Stage 2 — Policy Evaluation\nPolicy Engine]
    C -->|BLOCK| ERR2([403 — policy reason])
    C -->|ALLOW / REDACT / ROUTE_TO| D[Stage 3 — DLP Scan — Input\n3-Tier Pipeline]
    D -->|CANCEL| ERR3([403 / 400 — DLP block])
    D -->|REDACT — prompt sanitised| E[Stage 4 — Routing\nDeployment Mode]
    D -->|ALLOW — prompt unchanged| E
    E -->|SaaS| F1[Stage 5 — Provider Call\nSaaS direct to provider API]
    E -->|Hybrid Outpost| F2[Stage 5 — Provider Call\nOutpost via mTLS]
    F1 -->|502 / 503 provider error| ERR4([Error])
    F2 -->|502 / 503 provider error| ERR4
    F1 --> G[Stage 6 — Output Scan\n3-Tier Pipeline]
    F2 --> G
    G -->|BLOCK — output suppressed| ERR5([output_blocked SSE event])
    G -->|REDACT — inline replacement| H[Stage 7 — Audit Write\nHMAC-chained log entry]
    G -->|ALLOW| H
    H --> Z([Response to client])

Stage-by-Stage Breakdown

Stage 1 — Intake

Component: API Gateway (backend.app.api.chat, FastAPI dependencies)

The gateway receives the HTTP request and performs three checks before any application logic runs:

Authentication — Validates the Authorization: Bearer token. Accepted token types are JWT (session tokens issued by the platform) and API keys (long-lived tokens issued to service accounts). The JWT blacklist bloom filter is checked on every request; if the filter has not completed its first database sync at startup, the gateway returns 503 rather than risk processing a revoked token.
Tenant extraction — Resolves tenant_id from the authenticated token. All downstream enforcement is scoped to this tenant.
Rate limiting — Per-org and per-user rate limits are evaluated. Quota checks (plan tier request count, per-user/group quotas, and dollar budget caps) run in parallel with DLP during Stage 2 and can short-circuit with a 429 before the DLP scan completes.

Failure surface: 401 Unauthorized (invalid or missing token), 429 Too Many Requests (rate or quota exceeded), 503 Service Unavailable (blacklist filter not ready).

Stage 2 — Policy Evaluation

Component: Policy Engine (backend.app.services.policy_engine.evaluate_policy_chain)

The Policy Engine evaluates the request against the active policy packs for the tenant. Policy evaluation is the single enforcement point; it replaces all per-endpoint scattered checks that existed before the Epic J2 refactor.

Policy packs contain rules of three types:

Rule type	What it evaluates
`PROMPT`	Content-based analysis of the prompt text. Governance challenge flow applies: API callers receive HTTP 449 with a challenge payload; interactive (SSE) callers receive a `governance_prompt` SSE event.
`channel`	Restricts request origin by user or group membership. Used to enforce model access policies per team.
`intent_complexity`	Evaluated after intent classification when `auto_route` is enabled.

Possible decisions:

ALLOW — proceed to DLP scan.
BLOCK / CANCEL — request terminated; audit entry written.
REDACT — policy-specified redactions applied to the prompt before DLP.
ROUTE_TO — allow, but override the target model and provider. ROUTE_TO takes precedence over both auto_route intent classification and the model specified by the client.

Failure surface: 403 Forbidden (policy block), 449 (governance challenge requiring user justification).

Stage 3 — DLP Scan (Input)

Component: 3-Tier DLP Pipeline (backend.app.services.intake_pipeline, DLP microservice)

The DLP pipeline inspects the prompt text for sensitive data before it is forwarded to the provider. It runs as three sequential tiers, each only executing if the previous tier did not terminate the request:

Tier	Method	When active
Tier 1	Regex and pattern detectors — PII patterns, credential formats, keyword lists	Always. Fast path; runs synchronously in the gateway process.
Tier 2	ML NER classifier (DeBERTa-based) — confidence-scored entity detection	Runs if Tier 1 does not terminate. Requires GPU pool availability.
Tier 3	LLM-based contextual analysis	Hybrid Outpost deployments only (`tier3_active` flag in Outpost heartbeat).

Possible DLP actions:

ALLOW — prompt passes clean.
REDACT — detected entities replaced inline. The sanitised prompt is passed to the provider; original text is preserved in the audit record.
CANCEL / BLOCK — request terminated. Failure behavior is configurable per detector: block (default) or pass-through.

Failure surface: If the GPU pool is unavailable (parked or restarting), inference requests fail closed — the request is blocked rather than passing through without scanning. 403 or 400 returned depending on the rule configuration.

Stage 4 — Routing

Component: Chat handler (backend.app.api.chat.send_message), intent auto-router (backend.app.services.intent_auto_route)

After the prompt clears policy and DLP, the routing stage determines which provider path the request takes:

SaaS path — Request is forwarded directly from the gateway to the AI provider API over HTTPS. Used when the org’s deployment mode is SaaS.
Hybrid Outpost path — Request is forwarded to a customer-deployed Outpost container over mTLS. The Outpost authenticates to the gateway using a client certificate issued at registration time. NGINX enforces ssl_verify_client on for all /outpost/* paths.

Model selection priority (highest to lowest):

ROUTE_TO action from policy evaluation
Intent-based auto_route classification (if auto_route: true in the request)
Model specified by the client in body.models[0]
Fallback model from fallback_model / fallback_provider request fields or DB-configured fallback chain

Outpost offline behavior: If the Cloud control plane is unreachable, the Outpost continues enforcing the last-cached policy pack. Policy version is confirmed on every 120-second heartbeat.

Stage 5 — Provider Call

Component: Provider adapters (backend.app.services.chat, backend.app.providers.registry)

The (possibly redacted) prompt is forwarded to the selected AI provider. The gateway supports:

OpenAI
Anthropic
Azure OpenAI
Amazon Bedrock
Google (Gemini)
Mistral
Groq
Cohere

The provider call supports SSE streaming. The gateway begins yielding SSE content chunks to the client as soon as the provider starts responding. Token counts, model ID, and provider name are captured from the provider response for use in the audit record.

Failure surface: 502 Bad Gateway (provider returned an error), 503 Service Unavailable (provider unreachable or timeout).

Stage 6 — Output Scan

Component: Output DLP scanner (backend.app.services.intake_pipeline.finalize_pipeline, run_output_regex_scan)

The full provider response is scanned by the same 3-tier DLP pipeline before delivery to the client. The output scan has two modes:

Streaming scan (J3 mode) — When policy_output_streaming_scan is enabled and policy_output_buffer_ms > 0, Tier 1 (regex) runs on each accumulated buffer window as chunks arrive. Detected entities are redacted before the chunk is emitted. After the stream ends, the full 3-tier scan runs on the complete response. Any additional entities caught by the full scan that were missed during streaming are emitted as dlp_correction SSE events.
Buffer-all mode (J2 compatibility) — The full response is buffered, then scanned in one pass before delivery.

Possible output actions:

ALLOW — response delivered as-is.
REDACT — entities replaced inline before delivery to client. Originals preserved in audit.
BLOCK — entire response suppressed. Client receives an output_blocked SSE event. The blocked response text is retained in the audit record.

Failure surface: output_blocked SSE event (output DLP block). No HTTP error code — the stream has already started at this point.

Stage 7 — Audit Write

Component: Audit service (backend.app.services.audit.log_event)

An HMAC-chained audit log entry is written asynchronously after the provider response is finalized. The audit record includes:

Field	Content
`user_id`	UUID of the requesting user
`action`	Pipeline action taken (ALLOW, REDACT, CANCEL, BLOCK, ROUTE_TO)
`model_id`	Model that handled the request
`provider`	Provider that handled the request
`prompt_text`	Original prompt (configurable — can be omitted for privacy)
`response_text`	Provider response text
`token_count_input`	Input tokens consumed
`token_count_output`	Output tokens consumed
`cost_estimate`	Estimated cost in USD
`latency_ms`	End-to-end provider latency
`dlp_results`	Entities detected by input and output DLP scans
`previous_hmac`	HMAC of the previous entry (chain integrity)

The HMAC is computed over the event content concatenated with the previous entry’s HMAC. This makes the audit chain tamper-evident: any modification to a historical entry invalidates all subsequent HMACs.

A partial audit entry (status received) is written at Stage 1 before any processing occurs. This pre-log satisfies compliance requirements that mandate logging at intake, independent of whether the request later succeeds or fails. The entry is updated with full results after Stage 6 completes.

Failure surface: Audit write failures are logged and monitored but do not cause the response to fail. The client receives the response regardless of audit write status.

SaaS vs Hybrid Outpost Paths

Both deployment modes share the same Stage 1–3 and Stage 6–7 pipeline. The difference is in Stages 4–5.

SaaS

Client → api.arbitex.ai/v1 → Gateway (AKS) → Provider API

Policy evaluation runs in the Arbitex-managed AKS cluster.
DLP scanning runs on the GPU node pool within the same cluster.
The gateway forwards the request directly to the provider over HTTPS using provider API keys managed by Arbitex.
All audit records are written to the Arbitex-managed database.

Hybrid Outpost

Client → Outpost (customer environment) → api.arbitex.ai/outpost → Gateway relay → Provider API

The Outpost container runs in the customer’s environment (on-premise, private cloud, or VPC).
DLP scanning (Tiers 1–3, including Tier 3 LLM contextual analysis) runs inside the Outpost.
The Outpost pulls its policy pack from the Cloud control plane at startup and on a periodic interval. Each pull is confirmed via a 120-second heartbeat that reports policy_version.
If the Cloud is unreachable, the Outpost continues enforcing the last-cached policy pack. Requests are not dropped — they are evaluated against the cached policy.
mTLS is enforced on all Outpost-to-Cloud traffic. The Outpost presents a client certificate issued by the Cloud CA at registration time. NGINX verifies the full certificate chain.
Tier 3 DLP availability is reported in the Outpost heartbeat (tier3_active). If the model serving Tier 3 is unavailable, the pipeline degrades to Tier 1+2 only.

Failure Modes Summary

Stage	Component	Failure	Surface
1 — Intake	API Gateway	Invalid or missing Bearer token	`401 Unauthorized`
1 — Intake	API Gateway	JWT blacklist filter not ready at startup	`503 Service Unavailable`
1 — Intake	Quota service	Plan tier, per-user/group quota, or budget cap exceeded	`429 Too Many Requests`
2 — Policy evaluation	Policy Engine	Policy rule BLOCK decision	`403 Forbidden`
2 — Policy evaluation	Policy Engine	PROMPT governance challenge	`449` (API) or `governance_prompt` SSE event (interactive)
3 — DLP scan (input)	DLP pipeline	Detector action: CANCEL / BLOCK	`403` or `400` (configurable per detector)
3 — DLP scan (input)	GPU node pool	Inference unavailable (pool parked/restarting)	Request fails closed — `503` or DLP block
5 — Provider call	Provider adapter	Provider API error	`502 Bad Gateway`
5 — Provider call	Provider adapter	Provider unreachable or timeout	`503 Service Unavailable`
6 — Output scan	Output DLP	Response blocked	`output_blocked` SSE event (stream already open)
7 — Audit write	Audit service	Write failure	Non-fatal — logged, monitored; response still delivered