Skip to content

Request Lifecycle

Every AI request that passes through the Arbitex gateway traverses a fixed, ordered pipeline before reaching an AI provider. The pipeline is the single enforcement point for authentication, quota, policy, data loss prevention, routing, and compliance logging. No request bypasses any stage. This page describes each stage, what component is responsible for it, and how failures surface.


The pipeline has seven stages. Stages 1 through 3 run before the provider call. Stages 5 through 7 run after it. Stage 4 (routing) is the decision point that determines which provider path the request takes.

flowchart TD
A([Client]) -->|Authorization: Bearer| B[Stage 1 — Intake\nAPI Gateway]
B -->|401 invalid token\n429 rate limit| ERR1([Error])
B -->|tenant_id extracted| C[Stage 2 — Policy Evaluation\nPolicy Engine]
C -->|BLOCK| ERR2([403 — policy reason])
C -->|ALLOW / REDACT / ROUTE_TO| D[Stage 3 — DLP Scan — Input\n3-Tier Pipeline]
D -->|CANCEL| ERR3([403 / 400 — DLP block])
D -->|REDACT — prompt sanitised| E[Stage 4 — Routing\nDeployment Mode]
D -->|ALLOW — prompt unchanged| E
E -->|SaaS| F1[Stage 5 — Provider Call\nSaaS direct to provider API]
E -->|Hybrid Outpost| F2[Stage 5 — Provider Call\nOutpost via mTLS]
F1 -->|502 / 503 provider error| ERR4([Error])
F2 -->|502 / 503 provider error| ERR4
F1 --> G[Stage 6 — Output Scan\n3-Tier Pipeline]
F2 --> G
G -->|BLOCK — output suppressed| ERR5([output_blocked SSE event])
G -->|REDACT — inline replacement| H[Stage 7 — Audit Write\nHMAC-chained log entry]
G -->|ALLOW| H
H --> Z([Response to client])

Component: API Gateway (backend.app.api.chat, FastAPI dependencies)

The gateway receives the HTTP request and performs three checks before any application logic runs:

  • Authentication — Validates the Authorization: Bearer token. Accepted token types are JWT (session tokens issued by the platform) and API keys (long-lived tokens issued to service accounts). The JWT blacklist bloom filter is checked on every request; if the filter has not completed its first database sync at startup, the gateway returns 503 rather than risk processing a revoked token.
  • Tenant extraction — Resolves tenant_id from the authenticated token. All downstream enforcement is scoped to this tenant.
  • Rate limiting — Per-org and per-user rate limits are evaluated. Quota checks (plan tier request count, per-user/group quotas, and dollar budget caps) run in parallel with DLP during Stage 2 and can short-circuit with a 429 before the DLP scan completes.

Failure surface: 401 Unauthorized (invalid or missing token), 429 Too Many Requests (rate or quota exceeded), 503 Service Unavailable (blacklist filter not ready).


Component: Policy Engine (backend.app.services.policy_engine.evaluate_policy_chain)

The Policy Engine evaluates the request against the active policy packs for the tenant. Policy evaluation is the single enforcement point; it replaces all per-endpoint scattered checks that existed before the Epic J2 refactor.

Policy packs contain rules of three types:

Rule typeWhat it evaluates
PROMPTContent-based analysis of the prompt text. Governance challenge flow applies: API callers receive HTTP 449 with a challenge payload; interactive (SSE) callers receive a governance_prompt SSE event.
channelRestricts request origin by user or group membership. Used to enforce model access policies per team.
intent_complexityEvaluated after intent classification when auto_route is enabled.

Possible decisions:

  • ALLOW — proceed to DLP scan.
  • BLOCK / CANCEL — request terminated; audit entry written.
  • REDACT — policy-specified redactions applied to the prompt before DLP.
  • ROUTE_TO — allow, but override the target model and provider. ROUTE_TO takes precedence over both auto_route intent classification and the model specified by the client.

Failure surface: 403 Forbidden (policy block), 449 (governance challenge requiring user justification).


Component: 3-Tier DLP Pipeline (backend.app.services.intake_pipeline, DLP microservice)

The DLP pipeline inspects the prompt text for sensitive data before it is forwarded to the provider. It runs as three sequential tiers, each only executing if the previous tier did not terminate the request:

TierMethodWhen active
Tier 1Regex and pattern detectors — PII patterns, credential formats, keyword listsAlways. Fast path; runs synchronously in the gateway process.
Tier 2ML NER classifier (DeBERTa-based) — confidence-scored entity detectionRuns if Tier 1 does not terminate. Requires GPU pool availability.
Tier 3LLM-based contextual analysisHybrid Outpost deployments only (tier3_active flag in Outpost heartbeat).

Possible DLP actions:

  • ALLOW — prompt passes clean.
  • REDACT — detected entities replaced inline. The sanitised prompt is passed to the provider; original text is preserved in the audit record.
  • CANCEL / BLOCK — request terminated. Failure behavior is configurable per detector: block (default) or pass-through.

Failure surface: If the GPU pool is unavailable (parked or restarting), inference requests fail closed — the request is blocked rather than passing through without scanning. 403 or 400 returned depending on the rule configuration.


Component: Chat handler (backend.app.api.chat.send_message), intent auto-router (backend.app.services.intent_auto_route)

After the prompt clears policy and DLP, the routing stage determines which provider path the request takes:

  • SaaS path — Request is forwarded directly from the gateway to the AI provider API over HTTPS. Used when the org’s deployment mode is SaaS.
  • Hybrid Outpost path — Request is forwarded to a customer-deployed Outpost container over mTLS. The Outpost authenticates to the gateway using a client certificate issued at registration time. NGINX enforces ssl_verify_client on for all /outpost/* paths.

Model selection priority (highest to lowest):

  1. ROUTE_TO action from policy evaluation
  2. Intent-based auto_route classification (if auto_route: true in the request)
  3. Model specified by the client in body.models[0]
  4. Fallback model from fallback_model / fallback_provider request fields or DB-configured fallback chain

Outpost offline behavior: If the Cloud control plane is unreachable, the Outpost continues enforcing the last-cached policy pack. Policy version is confirmed on every 120-second heartbeat.


Component: Provider adapters (backend.app.services.chat, backend.app.providers.registry)

The (possibly redacted) prompt is forwarded to the selected AI provider. The gateway supports:

  • OpenAI
  • Anthropic
  • Azure OpenAI
  • Amazon Bedrock
  • Google (Gemini)
  • Mistral
  • Groq
  • Cohere

The provider call supports SSE streaming. The gateway begins yielding SSE content chunks to the client as soon as the provider starts responding. Token counts, model ID, and provider name are captured from the provider response for use in the audit record.

Failure surface: 502 Bad Gateway (provider returned an error), 503 Service Unavailable (provider unreachable or timeout).


Component: Output DLP scanner (backend.app.services.intake_pipeline.finalize_pipeline, run_output_regex_scan)

The full provider response is scanned by the same 3-tier DLP pipeline before delivery to the client. The output scan has two modes:

  • Streaming scan (J3 mode) — When policy_output_streaming_scan is enabled and policy_output_buffer_ms > 0, Tier 1 (regex) runs on each accumulated buffer window as chunks arrive. Detected entities are redacted before the chunk is emitted. After the stream ends, the full 3-tier scan runs on the complete response. Any additional entities caught by the full scan that were missed during streaming are emitted as dlp_correction SSE events.
  • Buffer-all mode (J2 compatibility) — The full response is buffered, then scanned in one pass before delivery.

Possible output actions:

  • ALLOW — response delivered as-is.
  • REDACT — entities replaced inline before delivery to client. Originals preserved in audit.
  • BLOCK — entire response suppressed. Client receives an output_blocked SSE event. The blocked response text is retained in the audit record.

Failure surface: output_blocked SSE event (output DLP block). No HTTP error code — the stream has already started at this point.


Component: Audit service (backend.app.services.audit.log_event)

An HMAC-chained audit log entry is written asynchronously after the provider response is finalized. The audit record includes:

FieldContent
user_idUUID of the requesting user
actionPipeline action taken (ALLOW, REDACT, CANCEL, BLOCK, ROUTE_TO)
model_idModel that handled the request
providerProvider that handled the request
prompt_textOriginal prompt (configurable — can be omitted for privacy)
response_textProvider response text
token_count_inputInput tokens consumed
token_count_outputOutput tokens consumed
cost_estimateEstimated cost in USD
latency_msEnd-to-end provider latency
dlp_resultsEntities detected by input and output DLP scans
previous_hmacHMAC of the previous entry (chain integrity)

The HMAC is computed over the event content concatenated with the previous entry’s HMAC. This makes the audit chain tamper-evident: any modification to a historical entry invalidates all subsequent HMACs.

A partial audit entry (status received) is written at Stage 1 before any processing occurs. This pre-log satisfies compliance requirements that mandate logging at intake, independent of whether the request later succeeds or fails. The entry is updated with full results after Stage 6 completes.

Failure surface: Audit write failures are logged and monitored but do not cause the response to fail. The client receives the response regardless of audit write status.


Both deployment modes share the same Stage 1–3 and Stage 6–7 pipeline. The difference is in Stages 4–5.

Client → api.arbitex.ai/v1 → Gateway (AKS) → Provider API
  • Policy evaluation runs in the Arbitex-managed AKS cluster.
  • DLP scanning runs on the GPU node pool within the same cluster.
  • The gateway forwards the request directly to the provider over HTTPS using provider API keys managed by Arbitex.
  • All audit records are written to the Arbitex-managed database.
Client → Outpost (customer environment) → api.arbitex.ai/outpost → Gateway relay → Provider API
  • The Outpost container runs in the customer’s environment (on-premise, private cloud, or VPC).
  • DLP scanning (Tiers 1–3, including Tier 3 LLM contextual analysis) runs inside the Outpost.
  • The Outpost pulls its policy pack from the Cloud control plane at startup and on a periodic interval. Each pull is confirmed via a 120-second heartbeat that reports policy_version.
  • If the Cloud is unreachable, the Outpost continues enforcing the last-cached policy pack. Requests are not dropped — they are evaluated against the cached policy.
  • mTLS is enforced on all Outpost-to-Cloud traffic. The Outpost presents a client certificate issued by the Cloud CA at registration time. NGINX verifies the full certificate chain.
  • Tier 3 DLP availability is reported in the Outpost heartbeat (tier3_active). If the model serving Tier 3 is unavailable, the pipeline degrades to Tier 1+2 only.

StageComponentFailureSurface
1 — IntakeAPI GatewayInvalid or missing Bearer token401 Unauthorized
1 — IntakeAPI GatewayJWT blacklist filter not ready at startup503 Service Unavailable
1 — IntakeQuota servicePlan tier, per-user/group quota, or budget cap exceeded429 Too Many Requests
2 — Policy evaluationPolicy EnginePolicy rule BLOCK decision403 Forbidden
2 — Policy evaluationPolicy EnginePROMPT governance challenge449 (API) or governance_prompt SSE event (interactive)
3 — DLP scan (input)DLP pipelineDetector action: CANCEL / BLOCK403 or 400 (configurable per detector)
3 — DLP scan (input)GPU node poolInference unavailable (pool parked/restarting)Request fails closed — 503 or DLP block
5 — Provider callProvider adapterProvider API error502 Bad Gateway
5 — Provider callProvider adapterProvider unreachable or timeout503 Service Unavailable
6 — Output scanOutput DLPResponse blockedoutput_blocked SSE event (stream already open)
7 — Audit writeAudit serviceWrite failureNon-fatal — logged, monitored; response still delivered