Architecture Decision Records

Architecture Decision Records (ADRs) document the key decisions made during Arbitex platform design. Each record captures the context, decision, rationale, and consequences of a significant architectural choice.

These decisions are locked and form the stable foundation for platform development. Changes to locked decisions require explicit PO approval and a new ADR.

ADR-001 — Policy Engine model: Palo Alto unified pipeline

Status: Locked (2026-03-08) Decision date: 2026-03-08

Context

Arbitex needed a policy evaluation model for AI request governance. Several models were considered: simple rule lists, deny-only filtering, and unified policy frameworks from enterprise security (Palo Alto NGFW model).

The platform needed to handle allow, block, prompt-for-approval, and route-to-alternate actions — not just filtering. Rules needed to compose across multiple policy packs, with configurable precedence.

Decision

Adopt the Palo Alto unified policy pipeline model: a first-match evaluation across an ordered policy chain, with explicit ALLOW, BLOCK, PROMPT, ROUTE_TO, and ALLOW_WITH_OVERRIDE actions. Policy Packs replace the previous “compliance bundles” and “rule lists” concepts under a unified abstraction.

Rationale

Palo Alto’s model is well-understood by enterprise security teams — familiar to the buyers Arbitex targets
First-match with explicit actions is deterministic and auditable; deny-overrides combining algorithm provides a stricter alternative when needed
Unified abstraction (Policy Pack = any pack, bundle or custom) avoids two separate rule systems
ROUTE_TO and ALLOW_WITH_OVERRIDE extend the model to cover AI-specific governance patterns without compromising the core evaluation semantics

Consequences

Policy Engine naming is a placeholder for easy find-replace when Brand brief finalizes product name
All existing “compliance bundles” became Policy Packs with pack_type: "bundle" — fully backward-compatible
PROMPT and ALLOW_WITH_OVERRIDE actions require audit log capture of approver identity and reason (not yet built — Epic J5 scope)

ADR-002 — Deployment topology: SaaS + Hybrid Outpost (no full self-host)

Status: Locked (2026-03-07) Decision date: 2026-03-07

Context

Customers have varying data residency and network isolation requirements. Three deployment models were evaluated: SaaS only, full self-host, and a hybrid model where the control plane is Arbitex-managed but the data plane runs in the customer environment.

Decision

Support exactly two deployment topologies:

SaaS — Arbitex-managed control plane and data plane. Customer traffic passes through api.arbitex.ai.
Hybrid Outpost — Arbitex-managed control plane (SaaS); customer-hosted data plane (Outpost container in customer VPC/network). No full self-host option.

Rationale

Full self-host requires distributing the entire platform stack (including proprietary DLP models), which creates IP risk and support complexity
Hybrid Outpost satisfies data residency requirements: sensitive prompts never leave the customer environment; only audit metadata and configuration reach the Arbitex control plane
SaaS control plane always being Arbitex-managed ensures policy packs and compliance bundles are consistently maintained
No full self-host simplifies the licensing and compliance surface area significantly

Consequences

Outpost is resilient to control plane outages (caches configuration locally)
Cloud control plane owns Outpost registration and certificate issuance
DeBERTa inference in Hybrid Outpost runs on customer infrastructure — GPU requirement documented per tier (SaaS=GPU required; Outpost prod=GPU required; Outpost trial=CPU via control plane flag)
“Changelog self-host” entry was removed from public documentation — capability does not exist

ADR-003 — Cloud provider: Azure

Status: Locked (2026-03-08) Decision date: 2026-03-08

Context

Target customers are predominantly in financial services, healthcare, and government — sectors where Azure has the highest market share. Primary pilot prospect is on Azure infrastructure. AWS and GCP were evaluated.

Decision

Azure as the sole cloud provider for Arbitex SaaS production infrastructure.

Rationale

Target customers (fin/health/gov) have largest presence on Azure — simplifies data residency, private connectivity (PrivateLink), and regulatory compliance discussions
Largest known pilot prospect is on Azure stack — alignment reduces friction
Azure Flexible Server PostgreSQL, Azure Cache for Redis, and Azure Key Vault are mature managed services for the data layer
GPU capacity (NC4as_T4_v3 with NVIDIA T4) available in East US 2 with 3 availability zones

Consequences

East US 2 region locked — 3 AZs, GPU capacity, regulated workload preference
Epic H (BYOK) and Epic I (Private connectivity) use Azure-native primitives (Key Vault, PrivateLink)
AWS Bedrock adapter is shipped (provider expansion) but the hosting infrastructure is Azure-only

ADR-004 — AKS over Azure Container Apps

Status: Locked (2026-03-09) Decision date: 2026-03-09

Context

Azure offers two container orchestration services: Azure Kubernetes Service (AKS) and Azure Container Apps (ACA). ACA provides a higher-level abstraction with automatic scaling. AKS requires more operational investment but provides full control.

Decision

AKS Standard tier over Azure Container Apps.

Rationale

ACA cannot support three capabilities that are non-negotiable for Arbitex:

GPU node pools — DeBERTa inference requires NVIDIA T4 GPU. ACA has no GPU support.
Azure Key Vault CSI driver — Required for mounting secrets as volumes (fail-fast startup pattern). ACA does not support CSI drivers.
Pod-level network policies (Calico) — Required for default-deny isolation between service pairs. ACA uses a shared network model.

AKS Standard tier ($72/month) provides 99.95% API server SLA and control-plane audit logs (required for SOC 2 CC7 logging).

Consequences

More operational investment than ACA (Helm charts, node pool management, pod security standards)
AKS free tier explicitly rejected — no SLA and no control-plane audit logs
System node pool uses B2s_v2 burstable instance for cost efficiency (~$61/month)

ADR-005 — Edge provider: Cloudflare Pro over Azure Front Door

Status: Locked (2026-03-09) Decision date: 2026-03-09

Context

A CDN and DDoS protection layer is required in front of the AKS origin. Azure Front Door and Cloudflare Pro were the primary candidates.

Decision

Cloudflare Pro ($20/month) over Azure Front Door.

Rationale

Capability	Cloudflare Pro	Azure Front Door
DDoS protection	Unmetered (included)	~$2,944/month (Azure DDoS Protection)
WAF	Included ($20/month)	Extra cost
DNS management	Included	Azure DNS separate
Monthly cost	~$20	~$350+

Cloudflare Authenticated Origin Pulls provides cryptographic mTLS from edge to origin — stronger than IP allowlist-only protection. Cloudflare’s global anycast network provides better latency characteristics than Azure Front Door’s POPs for the target customer geography.

Consequences

DNS managed in Cloudflare (not Azure DNS) — single pane for DNS + edge config, instant propagation
Cloudflare Origin CA certificate (1-year, auto-renewed by cert-manager) used for edge-to-origin TLS
Cloudflare IP allowlist synced automatically as defense-in-depth behind Authenticated Origin Pulls
All public domains proxied through Cloudflare — no direct IP exposure for origin

ADR-006 — Compliance framing: “designed for” (not “certified”)

Status: Locked (2026-03-07) Decision date: 2026-03-07

Context

Arbitex is building toward SOC 2 Type II certification but has not yet undergone a formal audit. Sales materials and documentation needed a framing that is accurate without misrepresenting certification status.

Decision

Use “designed for” framing in all customer-facing documentation and sales materials. Never claim SOC 2 Type II certification or equivalent certification status until the audit is complete.

Rationale

Misrepresenting certification status creates legal and reputational risk
“Designed for” accurately describes the architectural intent without making audit claims
Enterprise security teams understand the distinction — accurate framing builds trust faster than overclaiming
Compliance Bundles support customer compliance programs; the platform’s own certification status is separate

Consequences

All docs use “designed for” or “supports” language, not “certified” or “compliant”
Security overview page includes explicit disclaimer about current certification status
Sales team briefed: do not represent Arbitex as SOC 2 certified

ADR-007 — GRC framework: NIST CSF 2.0 + Privacy Framework + AI RMF

Status: Locked (2026-03-07) Decision date: 2026-03-07

Context

Arbitex needed a governance, risk, and compliance framework to structure internal security controls and communicate posture to enterprise customers. Multiple frameworks were evaluated: NIST CSF, ISO 27001, SOC 2, CIS Controls.

Decision

Primary: NIST Cybersecurity Framework 2.0 + NIST Privacy Framework 1.0 + NIST AI Risk Management Framework 1.0

Crosswalk: ISO 27001/27002/27701 (for EU customers and SOC 2 readiness)

Principle: Confidentiality ≠ Privacy (CIAP — these are related but distinct properties with separate control requirements)

Rationale

NIST frameworks are US government standards — highest credibility with federal and regulated-industry customers
AI RMF is purpose-built for AI system risk, which is the core of what Arbitex is
NIST CSF 2.0 added the Govern function, which maps directly to the Policy Engine
ISO crosswalk satisfies EU customers without requiring a separate framework implementation
NIST frameworks are freely available (no licensing) and widely documented

Consequences

Security documentation structured around NIST CSF functions (Govern, Identify, Protect, Detect, Respond, Recover)
Compliance Bundles map framework-specific controls to AI system risks (AI RMF Measure function)
Privacy controls (NIST Privacy Framework) handled separately from confidentiality controls (NIST CSF) — CCPA/GDPR bundles address the privacy dimension

ADR-008 — Audit log model: 90-day tamper-evident buffer; SIEM is record of truth

Status: Locked (2026-03-08) Decision date: 2026-03-08

Context

Enterprise customers require audit trails for compliance. Questions to resolve: how long does Arbitex retain audit data? Who owns the record of truth? How is tamper-evidence implemented?

Decision

Arbitex retains: 90-day tamper-evident buffer (active) + 2-year archive tier in Azure Log Analytics
Record of truth: Customer’s SIEM — Arbitex is not the system of record for long-term audit retention
Tamper-evidence: HMAC chain with key versioning (hmac_key_id field on each audit entry)

Rationale

Keeping Arbitex as the long-term record of record creates regulatory liability and increases COGS
Enterprise customers already have SIEM infrastructure; OCSF format export integrates naturally
90-day buffer covers the most common compliance investigation windows
HMAC chain provides the tamper-evidence property that most compliance frameworks require for audit logs, without requiring WORM storage

Consequences

SIEM integration guide is a P0 deliverable (shipped — see SIEM integration)
Customers must configure their SIEM retention to meet their specific framework requirements (BSA: 5 years; SEC 17a-4: 3–6 years; HIPAA: 6 years)
hmac_key_id field in audit entries supports key rotation without breaking chain validation

ADR-009 — SIEM format: OCSF with Splunk HEC as P0

Status: Locked (2026-03-08) Decision date: 2026-03-08

Context

SIEM export format and initial connector priority needed to be determined. Options: proprietary format, CEF, LEEF, OCSF, native connector per SIEM.

Decision

Wire format: OCSF (Open Cybersecurity Schema Framework) — vendor-neutral, structured
P0 connector: Splunk HTTP Event Collector (HEC)
P0 connector (Sentinel): Azure Sentinel
Additional connectors: 5 stubs for Elastic, Datadog, AWS Security Hub, Google Chronicle, IBM QRadar

Rationale

OCSF is the emerging industry standard for security telemetry — backed by AWS, Splunk, IBM, and others
OCSF over CEF/LEEF: machine-parseable schema with typed fields vs. semistructured key-value
Splunk has dominant market share in enterprise SIEM among financial services customers (primary target)
Azure Sentinel P0 because Arbitex runs on Azure — natural alignment for Azure-native customers
Connector framework design allows adding new targets without changing the event schema

Consequences

OCSF schema version and class IDs must be pinned and documented for customer SIEM rule writing
Splunk HEC + Sentinel shipped in platform S13 (205 tests)
Remaining 5 connectors are stubs — customers can implement via webhook if not yet native

ADR-010 — Staff admin plane isolation: `int.arbitex.ai` with NS delegation

Status: Locked (2026-03-09) Decision date: 2026-03-09

Context

Arbitex staff needed tooling for customer management (plan assignment, rate limits, MFA policy, billing views). This tooling handles cross-tenant sensitive operations and must be isolated from the customer-facing control plane.

Decision

Domain: int.arbitex.ai — subdomain chosen to avoid automated enumeration (not staff/admin/internal)
DNS: NS delegation from public DNS to a private RFC 1918 nameserver — entire int.arbitex.ai subtree unreachable from public internet
Access: VPN or Cloudflare Zero Trust connector required
Authentication: FIDO2/WebAuthn hardware key mandatory; no SMS/TOTP fallback
Model: Cross-tenant access via acting_org_id JWT claim — no session switching; both staff identity and target org captured in audit trail
Repo: Dedicated arbitex-staff repository — independent deploy pipeline

Rationale

NS delegation to RFC 1918 provides stronger isolation than IP allowlisting — external resolvers return SERVFAIL, no subdomain enumeration possible
FIDO2 is mandatory because the staff portal is the highest-privilege surface in the system (blast radius: all customers)
acting_org_id model ensures audit trail captures the full context of every cross-tenant action
Dedicated repo enforces clean separation of concerns and independent security review cycle

Consequences

int.arbitex.ai is an Epic L deliverable — not yet built
Requires VPN or Cloudflare Zero Trust connector as prerequisite (Epic L, sprint L4)
Staff auth stack dogfoods the same auth infrastructure as customers — proves platform auth in production use

ADR-011 — BYOK scope: Epic H (Azure Key Vault, customer-managed keys)

Status: Locked (2026-03-08) Decision date: 2026-03-08

Context

Enterprise customers in regulated industries often require customer-managed encryption keys (BYOK/CMEK) for data at rest. This was evaluated against Arbitex-managed keys (current state) and a deferred BYOK model.

Decision

BYOK is scoped under Epic H and blocked on Azure subscription provisioning (C005). The key architecture when implemented:

Customer provides their own Azure Key Vault key
Arbitex retrieves and caches the Data Encryption Key (DEK) in Redis db=2 (reserved for this purpose)
Arbitex wraps/unwraps database encryption keys using the customer KEK

Current state: all data encrypted with Arbitex-managed keys. BYOK not available.

Rationale

BYOK is a hard requirement for some regulated customers (certain HIPAA Business Associates, GLBA-covered entities with specific contractual requirements)
Deferring to Epic H allows the platform to launch and acquire initial customers without BYOK complexity
Redis db=2 is reserved in the current architecture to avoid a migration when BYOK ships
Azure Key Vault HSM-backed keys provide the customer-controlled key security property

Consequences

byok_scope decision documented in public-facing docs to be explicit with procurement teams about current vs. future state
Epic H (~70 pts) blocked on C005 (Azure subscription provisioning)
db=2 Redis slot reserved — do not use for other purposes

ADR-012 — Plan tier keys: five-tier model with `enterprise_outpost` separation

Status: Locked (2026-03-08) Decision date: 2026-03-08

Context

Arbitex needed a plan tier model for billing and entitlement logic. Initial designs had 3–4 tiers. The question was whether to model SaaS and Hybrid Outpost enterprise separately or as one tier with a deployment flag.

Decision

Five canonical plan tier keys:

Key	Description
`devfree_saas`	Developer free tier — 100k requests/month
`devpro_saas`	Developer Pro — 1M requests/month (“Early Access” during beta)
`team_saas`	Team tier
`enterprise_saas`	Enterprise SaaS
`enterprise_outpost`	Enterprise Hybrid Outpost (separate key — Outpost carries additional entitlements)

Enterprise limits stored in a separate enterprise_entitlements table.

Rationale

Separate enterprise_outpost key avoids a flag-based system (enterprise + outpost_enabled=true) that creates ambiguous entitlement logic
Explicit keys in entitlement checks are more readable and auditable than boolean combinations
Dev plan limits (100k free, 1M paid) were explicitly locked to prevent drift
“Early Access” framing for Dev Pro (not “Coming Soon”) — product is available, just pre-GA

Consequences

All code and entitlement checks must use these exact keys — no ad-hoc strings
devpro_saas marketed as “Early Access” until GA
enterprise_entitlements table exists as separate model — no inline JSON entitlement blobs on the plan record

ADR-013 — DeBERTa inference tiers: GPU required for SaaS; CPU allowed for Outpost trial

Status: Locked (2026-03-09) Decision date: 2026-03-09

Context

DeBERTa-based DLP inference requires significant compute. The question was whether to allow CPU inference as a fallback for any deployment configuration.

Decision

Deployment	Inference hardware	Controlled by
SaaS (Arbitex-hosted)	GPU required	Infrastructure (always NC4as_T4_v3)
Outpost — production	GPU required	Documentation requirement
Outpost — trial	CPU allowed	Arbitex control plane flag (not customer-configurable)

Rationale

DeBERTa on CPU is too slow for production request latency targets — GPU is the only viable option for SaaS and production Outpost
CPU mode exists specifically to allow customers to trial the Outpost without committing GPU infrastructure upfront
CPU mode is controlled by an Arbitex-managed flag (not customer-configurable) to prevent accidental production CPU deployment
Fail-closed behavior on GPU unavailability: requests are blocked (not passed through unscanned)

Consequences

Outpost trial documentation must clearly state CPU mode limitations (latency, throughput)
Inference fail-closed is a hard requirement — no silent degradation to unscanned traffic
GPU pool uses Deallocate mode (not Delete) for 2–4 minute restart vs. 8–25 minute cold provisioning

ADR-014 — 3-tier certificate authority: offline YubiHSM root

Status: Locked (2026-03-09) Decision date: 2026-03-09

Context

Arbitex needed a PKI for internal service-to-service TLS, Outpost client certificates, and Cloudflare Origin CA. Options: use a public CA for everything, use Azure Key Vault as a single-tier CA, or build a 3-tier hierarchy.

Decision

3-tier CA hierarchy:

Root CA (YubiHSM, offline) → Intermediate CA (Azure Key Vault HSM) → Leaf certs (cert-manager)

Root CA generated on YubiHSM hardware security module; taken offline after signing the intermediate
Intermediate CA lives in Azure Key Vault (separate HSM from the root)
Leaf certificates issued by cert-manager using the step-ca issuer against the Key Vault intermediate
Single HSM acceptable for pilot; split to two HSMs before first enterprise contract

Rationale

Offline root means a compromise of the online intermediate CA cannot forge a new root — the highest-value key never touches a networked system
cert-manager automates leaf certificate issuance and renewal — no manual cert management
3-tier is the industry standard for enterprise PKI; familiar to customers during security reviews
YubiHSM provides FIPS 140-2 Level 3 assurance for the root key

Consequences

YubiHSM hardware is a prerequisite for Phase B (Azure deployment) — not a software component
Cert-manager renewal failure alert configured (CertificateRequest Failed >1h → alert)
Cloudflare Origin CA cert (edge-to-origin) uses a different CA path (Cloudflare’s own CA) — managed separately from the internal 3-tier hierarchy
Origin CA is 1-year term (not 15-year) — cert-manager handles auto-renewal; shorter term limits blast radius

ADR-015 — MFA enforcement: staff-configured per org, platform-enforced

Status: Locked (2026-03-09) Decision date: 2026-03-09

Context

Multi-factor authentication policy needed to be configurable per organization while being enforced uniformly by the platform. The question was where policy configuration lives and where enforcement runs.

Decision

Configuration: Staff admin portal (int.arbitex.ai) — Arbitex staff set MFA policy per org and per plan tier
Enforcement: Platform API auth middleware — checks org MFA policy on every login; rejects if unmet
Supported MFA types: FIDO2/WebAuthn hardware key, authenticator app (TOTP), SMS, “any MFA”
Enterprise default: Enterprise plans can be configured to require FIDO2 hardware key

Rationale

Separating policy configuration (staff plane) from policy enforcement (platform) keeps the customer-facing auth middleware simple — it reads org settings, it doesn’t manage them
Staff plane controls MFA policy as an operational lever (e.g., enforce FIDO for healthcare customers)
Putting configuration in the staff plane prevents customers from weakening their own MFA requirements once set by their contract
Platform-side enforcement means no client-side bypass is possible

Consequences

MFA enforcement is an Epic L deliverable (sprint L3) — not yet built
Platform auth middleware needs to call into org_settings for MFA policy check on login
MFA enrollment prompts (platform UX) are in scope for L3 T16
Staff portal configures MFA types; enforcement runs on every login (no session caching of MFA state)

ADR-016 — SIEM native format: OCSF over raw JSON for Outpost direct sink

Status: Locked (2026-03-12) Decision date: 2026-03-12

Context

The Outpost SIEM direct sink (outpost-0022) needed to choose an event format when delivering audit events to customer SIEM endpoints (Splunk HEC, Microsoft Sentinel DCR, Elasticsearch). Two options were evaluated: raw JSON (the existing Outpost audit log format) and OCSF (Open Cybersecurity Schema Framework) — the same format used by the Platform SIEM connectors.

Decision

Use OCSF format for all Outpost direct sink deliveries, with sourcetype: "arbitex:ocsf" for Splunk HEC.

Rationale

OCSF is already the Platform wire format for Splunk, Sentinel, and Elastic connectors — customers maintaining both Platform and Outpost sinks see a consistent schema.
OCSF is a typed, machine-parseable schema; raw JSON requires per-customer field extractions in each SIEM.
The arbitex:ocsf sourcetype provides a clear namespace in Splunk; customers can write detection rules against a stable field set.
OCSF is backed by AWS, Splunk, IBM, and others — not a proprietary format that could create long-term vendor lock-in.

Consequences

Customers running both the Platform SIEM relay and the Outpost direct sink to the same SIEM get identical schema from both paths — no separate field extraction configuration.
The sourcetype: "arbitex:ocsf" Splunk value is fixed at the Outpost; customers must create this sourcetype if it does not already exist.
The prior Outpost SIEM direct sink (outpost-0008/0015 syslog/CEF implementation) used raw JSON/CEF — customers migrating to the outpost-0022 implementation must update their SIEM parsing rules.

ADR-017 — Budget enforcement: local-first (Outpost SQLite) over SaaS-pushed spend

Status: Locked (2026-03-12) Decision date: 2026-03-12

Context

Two approaches to Outpost budget cap enforcement were evaluated:

SaaS-pushed spend (legacy budget.spent_usd in policy bundle): The Platform tracks usage centrally and pushes the current spend figure into the policy bundle on each sync. The Outpost compares the pushed value against caps.
Local-first (outpost-0017 budget_config + local UsageTracker): The Outpost tracks usage locally in SQLite and evaluates caps against locally-accumulated totals.

Both systems coexist in the current codebase; this ADR locks the local-first approach as the authoritative enforcement model.

Decision

Local-first budget enforcement (BudgetCapEnforcer + UsageTracker) is the authoritative enforcement model. The SaaS-pushed spend path is retained for legacy compatibility but is not the source of truth.

Rationale

Policy sync runs every 60 seconds. Under the SaaS-pushed model, an Outpost can accumulate up to 60 seconds of spend above the cap before the bundle refreshes with updated totals — creating a ~60-second enforcement gap.
During network partition, the bundle is served from cache. SaaS-pushed spend becomes stale immediately; local SQLite totals remain current.
Local enforcement eliminates the round-trip latency for enforcement decisions — no blocking call to Platform needed.
SQLite WAL mode provides thread-safe, durable local storage for usage counters without requiring a separate database service.

Consequences

Caps are evaluated against the Outpost’s local usage totals, which may differ from Platform-side totals (due to multi-outpost deployments, provider billing vs. token-based estimation, etc.). Monitor both Outpost and Platform usage dashboards in production.
The local UsageTracker uses SQLite at USAGE_DB_PATH. Set to empty string to disable local tracking; enforcement becomes a no-op.
Hourly period rollover relies on the Outpost running at the rollover boundary. If stopped at rollover, stale totals persist until the next request.

ADR-018 — PKCE public client support: S256-only, no implicit flow

Status: Locked (2026-03-12) Decision date: 2026-03-12

Context

Platform platform-0044 added OAuth 2.0 PKCE support (RFC 7636) to enable authorization_code grants for public clients (SPAs, CLIs, mobile apps). Two decisions were made: which code challenge methods to support, and whether to allow the implicit grant flow as an alternative.

Decision

Support S256 only for PKCE code challenge method. plain is rejected.
No implicit flow. Authorization code + PKCE is the only supported flow for public clients.
OAuthClient.client_type distinguishes confidential (client secret required) from public (PKCE required, no secret).

Rationale

plain challenges offer no meaningful security benefit over the implicit flow — they are trivially interceptable. S256 provides the hash-based protection that is the entire point of PKCE.
The implicit flow is deprecated in OAuth 2.0 Security Best Current Practice (RFC 9700) — it exposes tokens in redirect URI fragments and cannot use refresh tokens.
S256-only simplifies the implementation surface: one code challenge algorithm, one authorization code grant path, no implicit edge cases.
PKCE code verifiers are stored in Redis with a 10-minute TTL and consumed atomically (GET+DEL) — replay attacks are rejected at the store layer.

Consequences

Existing confidential clients (client_credentials grant) are unaffected — client_type defaults to confidential in migration 063.
Public clients must send a code_challenge (S256) in the authorization request. POST /api/oauth/authorize returns 400 if code_challenge is absent for a public client.
Arbitex does not issue refresh tokens for public clients in the initial implementation — clients must re-authorize when the access token expires.

ADR-019 — Structured JSON logging: Log Analytics-ready format platform-wide

Status: Locked (2026-03-12) Decision date: 2026-03-12

Context

Platform platform-0044 (T566) and platform-0025 (T445) shipped a structured JSON log formatter as a platform-wide logging standard. The decision to standardise on JSON over human-readable plain text required choosing a format and making it the default.

Decision

All Platform services emit structured JSON log lines to stdout. The formatter is implemented as a Python logging.Formatter subclass and configured in the application lifespan handler. Log lines include at minimum: timestamp (ISO 8601 UTC), level, logger, message, plus any extra fields passed to the log call.

Rationale

Azure Log Analytics (the target observability backend) ingests structured JSON natively — plain text requires custom parsing pipelines.
Structured logs enable field-level querying in Log Analytics (KQL) without regex extraction.
Per-request correlation fields (request_id, org_id, user_id) can be included as structured fields rather than embedded in log strings.
Machine-parseable logs from day one avoids a migration later when Log Analytics is wired.

Consequences

Human-readable log viewing requires a JSON formatter in the terminal (e.g. kubectl logs | jq). Operations staff must be aware of this.
The logging.basicConfig call in test fixtures must be updated to handle JSON format if test output is parsed.
Third-party libraries that emit their own log format (Uvicorn access log, SQLAlchemy echo) are not covered by the formatter — their output remains unstructured. Suppress or re-route those loggers as needed.

Index

ADR	Title	Status
ADR-001	Policy Engine model: Palo Alto unified pipeline	Locked
ADR-002	Deployment topology: SaaS + Hybrid Outpost	Locked
ADR-003	Cloud provider: Azure	Locked
ADR-004	AKS over Azure Container Apps	Locked
ADR-005	Edge provider: Cloudflare Pro	Locked
ADR-006	Compliance framing: “designed for”	Locked
ADR-007	GRC framework: NIST CSF 2.0 + Privacy Framework + AI RMF	Locked
ADR-008	Audit log model: 90-day buffer; SIEM is record of truth	Locked
ADR-009	SIEM format: OCSF with Splunk HEC as P0	Locked
ADR-010	Staff admin plane: `int.arbitex.ai` NS delegation	Locked
ADR-011	BYOK scope: Epic H	Locked
ADR-012	Plan tier keys: five-tier model	Locked
ADR-013	DeBERTa inference tiers	Locked
ADR-014	3-tier CA: offline YubiHSM root	Locked
ADR-015	MFA enforcement: staff-configured, platform-enforced	Locked
ADR-016	SIEM native format: OCSF over raw JSON for Outpost direct sink	Locked
ADR-017	Budget enforcement: local-first (Outpost SQLite) over SaaS-pushed spend	Locked
ADR-018	PKCE public client support: S256-only, no implicit flow	Locked
ADR-019	Structured JSON logging: Log Analytics-ready format platform-wide	Locked

Architecture Decision Records

ADR-001 — Policy Engine model: Palo Alto unified pipeline

Context

Decision

Rationale

Consequences

ADR-002 — Deployment topology: SaaS + Hybrid Outpost (no full self-host)

Context

Decision

Rationale

Consequences

ADR-003 — Cloud provider: Azure

Context

Decision

Rationale

Consequences

ADR-004 — AKS over Azure Container Apps

Context

Decision

Rationale

Consequences

ADR-005 — Edge provider: Cloudflare Pro over Azure Front Door

Context

Decision

Rationale

Consequences

ADR-006 — Compliance framing: “designed for” (not “certified”)

Context

Decision

Rationale

Consequences

ADR-007 — GRC framework: NIST CSF 2.0 + Privacy Framework + AI RMF

Context

Decision

Rationale

Consequences

ADR-008 — Audit log model: 90-day tamper-evident buffer; SIEM is record of truth

Context

Decision

Rationale

Consequences

ADR-009 — SIEM format: OCSF with Splunk HEC as P0

Context

Decision

Rationale

Consequences

ADR-010 — Staff admin plane isolation: int.arbitex.ai with NS delegation

Context

Decision

Rationale

Consequences

ADR-011 — BYOK scope: Epic H (Azure Key Vault, customer-managed keys)

Context

Decision

Rationale

Consequences

ADR-012 — Plan tier keys: five-tier model with enterprise_outpost separation

Context

Decision

Rationale

Consequences

ADR-013 — DeBERTa inference tiers: GPU required for SaaS; CPU allowed for Outpost trial

Context

Decision

Rationale

Consequences

ADR-014 — 3-tier certificate authority: offline YubiHSM root

Context

Decision

Rationale

Consequences

ADR-015 — MFA enforcement: staff-configured per org, platform-enforced

Context

Decision

Rationale

Consequences

ADR-016 — SIEM native format: OCSF over raw JSON for Outpost direct sink

Context

Decision

Rationale

ADR-010 — Staff admin plane isolation: `int.arbitex.ai` with NS delegation

ADR-012 — Plan tier keys: five-tier model with `enterprise_outpost` separation