Skip to content

Architecture Decision Records

Architecture Decision Records (ADRs) document the key decisions made during Arbitex platform design. Each record captures the context, decision, rationale, and consequences of a significant architectural choice.

These decisions are locked and form the stable foundation for platform development. Changes to locked decisions require explicit PO approval and a new ADR.


ADR-001 — Policy Engine model: Palo Alto unified pipeline

Section titled “ADR-001 — Policy Engine model: Palo Alto unified pipeline”

Status: Locked (2026-03-08) Decision date: 2026-03-08

Arbitex needed a policy evaluation model for AI request governance. Several models were considered: simple rule lists, deny-only filtering, and unified policy frameworks from enterprise security (Palo Alto NGFW model).

The platform needed to handle allow, block, prompt-for-approval, and route-to-alternate actions — not just filtering. Rules needed to compose across multiple policy packs, with configurable precedence.

Adopt the Palo Alto unified policy pipeline model: a first-match evaluation across an ordered policy chain, with explicit ALLOW, BLOCK, PROMPT, ROUTE_TO, and ALLOW_WITH_OVERRIDE actions. Policy Packs replace the previous “compliance bundles” and “rule lists” concepts under a unified abstraction.

  • Palo Alto’s model is well-understood by enterprise security teams — familiar to the buyers Arbitex targets
  • First-match with explicit actions is deterministic and auditable; deny-overrides combining algorithm provides a stricter alternative when needed
  • Unified abstraction (Policy Pack = any pack, bundle or custom) avoids two separate rule systems
  • ROUTE_TO and ALLOW_WITH_OVERRIDE extend the model to cover AI-specific governance patterns without compromising the core evaluation semantics
  • Policy Engine naming is a placeholder for easy find-replace when Brand brief finalizes product name
  • All existing “compliance bundles” became Policy Packs with pack_type: "bundle" — fully backward-compatible
  • PROMPT and ALLOW_WITH_OVERRIDE actions require audit log capture of approver identity and reason (not yet built — Epic J5 scope)

ADR-002 — Deployment topology: SaaS + Hybrid Outpost (no full self-host)

Section titled “ADR-002 — Deployment topology: SaaS + Hybrid Outpost (no full self-host)”

Status: Locked (2026-03-07) Decision date: 2026-03-07

Customers have varying data residency and network isolation requirements. Three deployment models were evaluated: SaaS only, full self-host, and a hybrid model where the control plane is Arbitex-managed but the data plane runs in the customer environment.

Support exactly two deployment topologies:

  1. SaaS — Arbitex-managed control plane and data plane. Customer traffic passes through api.arbitex.ai.
  2. Hybrid Outpost — Arbitex-managed control plane (SaaS); customer-hosted data plane (Outpost container in customer VPC/network). No full self-host option.
  • Full self-host requires distributing the entire platform stack (including proprietary DLP models), which creates IP risk and support complexity
  • Hybrid Outpost satisfies data residency requirements: sensitive prompts never leave the customer environment; only audit metadata and configuration reach the Arbitex control plane
  • SaaS control plane always being Arbitex-managed ensures policy packs and compliance bundles are consistently maintained
  • No full self-host simplifies the licensing and compliance surface area significantly
  • Outpost is resilient to control plane outages (caches configuration locally)
  • Cloud control plane owns Outpost registration and certificate issuance
  • DeBERTa inference in Hybrid Outpost runs on customer infrastructure — GPU requirement documented per tier (SaaS=GPU required; Outpost prod=GPU required; Outpost trial=CPU via control plane flag)
  • “Changelog self-host” entry was removed from public documentation — capability does not exist

Status: Locked (2026-03-08) Decision date: 2026-03-08

Target customers are predominantly in financial services, healthcare, and government — sectors where Azure has the highest market share. Primary pilot prospect is on Azure infrastructure. AWS and GCP were evaluated.

Azure as the sole cloud provider for Arbitex SaaS production infrastructure.

  • Target customers (fin/health/gov) have largest presence on Azure — simplifies data residency, private connectivity (PrivateLink), and regulatory compliance discussions
  • Largest known pilot prospect is on Azure stack — alignment reduces friction
  • Azure Flexible Server PostgreSQL, Azure Cache for Redis, and Azure Key Vault are mature managed services for the data layer
  • GPU capacity (NC4as_T4_v3 with NVIDIA T4) available in East US 2 with 3 availability zones
  • East US 2 region locked — 3 AZs, GPU capacity, regulated workload preference
  • Epic H (BYOK) and Epic I (Private connectivity) use Azure-native primitives (Key Vault, PrivateLink)
  • AWS Bedrock adapter is shipped (provider expansion) but the hosting infrastructure is Azure-only

Status: Locked (2026-03-09) Decision date: 2026-03-09

Azure offers two container orchestration services: Azure Kubernetes Service (AKS) and Azure Container Apps (ACA). ACA provides a higher-level abstraction with automatic scaling. AKS requires more operational investment but provides full control.

AKS Standard tier over Azure Container Apps.

ACA cannot support three capabilities that are non-negotiable for Arbitex:

  1. GPU node pools — DeBERTa inference requires NVIDIA T4 GPU. ACA has no GPU support.
  2. Azure Key Vault CSI driver — Required for mounting secrets as volumes (fail-fast startup pattern). ACA does not support CSI drivers.
  3. Pod-level network policies (Calico) — Required for default-deny isolation between service pairs. ACA uses a shared network model.

AKS Standard tier ($72/month) provides 99.95% API server SLA and control-plane audit logs (required for SOC 2 CC7 logging).

  • More operational investment than ACA (Helm charts, node pool management, pod security standards)
  • AKS free tier explicitly rejected — no SLA and no control-plane audit logs
  • System node pool uses B2s_v2 burstable instance for cost efficiency (~$61/month)

ADR-005 — Edge provider: Cloudflare Pro over Azure Front Door

Section titled “ADR-005 — Edge provider: Cloudflare Pro over Azure Front Door”

Status: Locked (2026-03-09) Decision date: 2026-03-09

A CDN and DDoS protection layer is required in front of the AKS origin. Azure Front Door and Cloudflare Pro were the primary candidates.

Cloudflare Pro ($20/month) over Azure Front Door.

CapabilityCloudflare ProAzure Front Door
DDoS protectionUnmetered (included)~$2,944/month (Azure DDoS Protection)
WAFIncluded ($20/month)Extra cost
DNS managementIncludedAzure DNS separate
Monthly cost~$20~$350+

Cloudflare Authenticated Origin Pulls provides cryptographic mTLS from edge to origin — stronger than IP allowlist-only protection. Cloudflare’s global anycast network provides better latency characteristics than Azure Front Door’s POPs for the target customer geography.

  • DNS managed in Cloudflare (not Azure DNS) — single pane for DNS + edge config, instant propagation
  • Cloudflare Origin CA certificate (1-year, auto-renewed by cert-manager) used for edge-to-origin TLS
  • Cloudflare IP allowlist synced automatically as defense-in-depth behind Authenticated Origin Pulls
  • All public domains proxied through Cloudflare — no direct IP exposure for origin

ADR-006 — Compliance framing: “designed for” (not “certified”)

Section titled “ADR-006 — Compliance framing: “designed for” (not “certified”)”

Status: Locked (2026-03-07) Decision date: 2026-03-07

Arbitex is building toward SOC 2 Type II certification but has not yet undergone a formal audit. Sales materials and documentation needed a framing that is accurate without misrepresenting certification status.

Use “designed for” framing in all customer-facing documentation and sales materials. Never claim SOC 2 Type II certification or equivalent certification status until the audit is complete.

  • Misrepresenting certification status creates legal and reputational risk
  • “Designed for” accurately describes the architectural intent without making audit claims
  • Enterprise security teams understand the distinction — accurate framing builds trust faster than overclaiming
  • Compliance Bundles support customer compliance programs; the platform’s own certification status is separate
  • All docs use “designed for” or “supports” language, not “certified” or “compliant”
  • Security overview page includes explicit disclaimer about current certification status
  • Sales team briefed: do not represent Arbitex as SOC 2 certified

ADR-007 — GRC framework: NIST CSF 2.0 + Privacy Framework + AI RMF

Section titled “ADR-007 — GRC framework: NIST CSF 2.0 + Privacy Framework + AI RMF”

Status: Locked (2026-03-07) Decision date: 2026-03-07

Arbitex needed a governance, risk, and compliance framework to structure internal security controls and communicate posture to enterprise customers. Multiple frameworks were evaluated: NIST CSF, ISO 27001, SOC 2, CIS Controls.

Primary: NIST Cybersecurity Framework 2.0 + NIST Privacy Framework 1.0 + NIST AI Risk Management Framework 1.0

Crosswalk: ISO 27001/27002/27701 (for EU customers and SOC 2 readiness)

Principle: Confidentiality ≠ Privacy (CIAP — these are related but distinct properties with separate control requirements)

  • NIST frameworks are US government standards — highest credibility with federal and regulated-industry customers
  • AI RMF is purpose-built for AI system risk, which is the core of what Arbitex is
  • NIST CSF 2.0 added the Govern function, which maps directly to the Policy Engine
  • ISO crosswalk satisfies EU customers without requiring a separate framework implementation
  • NIST frameworks are freely available (no licensing) and widely documented
  • Security documentation structured around NIST CSF functions (Govern, Identify, Protect, Detect, Respond, Recover)
  • Compliance Bundles map framework-specific controls to AI system risks (AI RMF Measure function)
  • Privacy controls (NIST Privacy Framework) handled separately from confidentiality controls (NIST CSF) — CCPA/GDPR bundles address the privacy dimension

ADR-008 — Audit log model: 90-day tamper-evident buffer; SIEM is record of truth

Section titled “ADR-008 — Audit log model: 90-day tamper-evident buffer; SIEM is record of truth”

Status: Locked (2026-03-08) Decision date: 2026-03-08

Enterprise customers require audit trails for compliance. Questions to resolve: how long does Arbitex retain audit data? Who owns the record of truth? How is tamper-evidence implemented?

  • Arbitex retains: 90-day tamper-evident buffer (active) + 2-year archive tier in Azure Log Analytics
  • Record of truth: Customer’s SIEM — Arbitex is not the system of record for long-term audit retention
  • Tamper-evidence: HMAC chain with key versioning (hmac_key_id field on each audit entry)
  • Keeping Arbitex as the long-term record of record creates regulatory liability and increases COGS
  • Enterprise customers already have SIEM infrastructure; OCSF format export integrates naturally
  • 90-day buffer covers the most common compliance investigation windows
  • HMAC chain provides the tamper-evidence property that most compliance frameworks require for audit logs, without requiring WORM storage
  • SIEM integration guide is a P0 deliverable (shipped — see SIEM integration)
  • Customers must configure their SIEM retention to meet their specific framework requirements (BSA: 5 years; SEC 17a-4: 3–6 years; HIPAA: 6 years)
  • hmac_key_id field in audit entries supports key rotation without breaking chain validation

ADR-009 — SIEM format: OCSF with Splunk HEC as P0

Section titled “ADR-009 — SIEM format: OCSF with Splunk HEC as P0”

Status: Locked (2026-03-08) Decision date: 2026-03-08

SIEM export format and initial connector priority needed to be determined. Options: proprietary format, CEF, LEEF, OCSF, native connector per SIEM.

  • Wire format: OCSF (Open Cybersecurity Schema Framework) — vendor-neutral, structured
  • P0 connector: Splunk HTTP Event Collector (HEC)
  • P0 connector (Sentinel): Azure Sentinel
  • Additional connectors: 5 stubs for Elastic, Datadog, AWS Security Hub, Google Chronicle, IBM QRadar
  • OCSF is the emerging industry standard for security telemetry — backed by AWS, Splunk, IBM, and others
  • OCSF over CEF/LEEF: machine-parseable schema with typed fields vs. semistructured key-value
  • Splunk has dominant market share in enterprise SIEM among financial services customers (primary target)
  • Azure Sentinel P0 because Arbitex runs on Azure — natural alignment for Azure-native customers
  • Connector framework design allows adding new targets without changing the event schema
  • OCSF schema version and class IDs must be pinned and documented for customer SIEM rule writing
  • Splunk HEC + Sentinel shipped in platform S13 (205 tests)
  • Remaining 5 connectors are stubs — customers can implement via webhook if not yet native

ADR-010 — Staff admin plane isolation: int.arbitex.ai with NS delegation

Section titled “ADR-010 — Staff admin plane isolation: int.arbitex.ai with NS delegation”

Status: Locked (2026-03-09) Decision date: 2026-03-09

Arbitex staff needed tooling for customer management (plan assignment, rate limits, MFA policy, billing views). This tooling handles cross-tenant sensitive operations and must be isolated from the customer-facing control plane.

  • Domain: int.arbitex.ai — subdomain chosen to avoid automated enumeration (not staff/admin/internal)
  • DNS: NS delegation from public DNS to a private RFC 1918 nameserver — entire int.arbitex.ai subtree unreachable from public internet
  • Access: VPN or Cloudflare Zero Trust connector required
  • Authentication: FIDO2/WebAuthn hardware key mandatory; no SMS/TOTP fallback
  • Model: Cross-tenant access via acting_org_id JWT claim — no session switching; both staff identity and target org captured in audit trail
  • Repo: Dedicated arbitex-staff repository — independent deploy pipeline
  • NS delegation to RFC 1918 provides stronger isolation than IP allowlisting — external resolvers return SERVFAIL, no subdomain enumeration possible
  • FIDO2 is mandatory because the staff portal is the highest-privilege surface in the system (blast radius: all customers)
  • acting_org_id model ensures audit trail captures the full context of every cross-tenant action
  • Dedicated repo enforces clean separation of concerns and independent security review cycle
  • int.arbitex.ai is an Epic L deliverable — not yet built
  • Requires VPN or Cloudflare Zero Trust connector as prerequisite (Epic L, sprint L4)
  • Staff auth stack dogfoods the same auth infrastructure as customers — proves platform auth in production use

ADR-011 — BYOK scope: Epic H (Azure Key Vault, customer-managed keys)

Section titled “ADR-011 — BYOK scope: Epic H (Azure Key Vault, customer-managed keys)”

Status: Locked (2026-03-08) Decision date: 2026-03-08

Enterprise customers in regulated industries often require customer-managed encryption keys (BYOK/CMEK) for data at rest. This was evaluated against Arbitex-managed keys (current state) and a deferred BYOK model.

BYOK is scoped under Epic H and blocked on Azure subscription provisioning (C005). The key architecture when implemented:

  • Customer provides their own Azure Key Vault key
  • Arbitex retrieves and caches the Data Encryption Key (DEK) in Redis db=2 (reserved for this purpose)
  • Arbitex wraps/unwraps database encryption keys using the customer KEK

Current state: all data encrypted with Arbitex-managed keys. BYOK not available.

  • BYOK is a hard requirement for some regulated customers (certain HIPAA Business Associates, GLBA-covered entities with specific contractual requirements)
  • Deferring to Epic H allows the platform to launch and acquire initial customers without BYOK complexity
  • Redis db=2 is reserved in the current architecture to avoid a migration when BYOK ships
  • Azure Key Vault HSM-backed keys provide the customer-controlled key security property
  • byok_scope decision documented in public-facing docs to be explicit with procurement teams about current vs. future state
  • Epic H (~70 pts) blocked on C005 (Azure subscription provisioning)
  • db=2 Redis slot reserved — do not use for other purposes

ADR-012 — Plan tier keys: five-tier model with enterprise_outpost separation

Section titled “ADR-012 — Plan tier keys: five-tier model with enterprise_outpost separation”

Status: Locked (2026-03-08) Decision date: 2026-03-08

Arbitex needed a plan tier model for billing and entitlement logic. Initial designs had 3–4 tiers. The question was whether to model SaaS and Hybrid Outpost enterprise separately or as one tier with a deployment flag.

Five canonical plan tier keys:

KeyDescription
devfree_saasDeveloper free tier — 100k requests/month
devpro_saasDeveloper Pro — 1M requests/month (“Early Access” during beta)
team_saasTeam tier
enterprise_saasEnterprise SaaS
enterprise_outpostEnterprise Hybrid Outpost (separate key — Outpost carries additional entitlements)

Enterprise limits stored in a separate enterprise_entitlements table.

  • Separate enterprise_outpost key avoids a flag-based system (enterprise + outpost_enabled=true) that creates ambiguous entitlement logic
  • Explicit keys in entitlement checks are more readable and auditable than boolean combinations
  • Dev plan limits (100k free, 1M paid) were explicitly locked to prevent drift
  • “Early Access” framing for Dev Pro (not “Coming Soon”) — product is available, just pre-GA
  • All code and entitlement checks must use these exact keys — no ad-hoc strings
  • devpro_saas marketed as “Early Access” until GA
  • enterprise_entitlements table exists as separate model — no inline JSON entitlement blobs on the plan record

ADR-013 — DeBERTa inference tiers: GPU required for SaaS; CPU allowed for Outpost trial

Section titled “ADR-013 — DeBERTa inference tiers: GPU required for SaaS; CPU allowed for Outpost trial”

Status: Locked (2026-03-09) Decision date: 2026-03-09

DeBERTa-based DLP inference requires significant compute. The question was whether to allow CPU inference as a fallback for any deployment configuration.

DeploymentInference hardwareControlled by
SaaS (Arbitex-hosted)GPU requiredInfrastructure (always NC4as_T4_v3)
Outpost — productionGPU requiredDocumentation requirement
Outpost — trialCPU allowedArbitex control plane flag (not customer-configurable)
  • DeBERTa on CPU is too slow for production request latency targets — GPU is the only viable option for SaaS and production Outpost
  • CPU mode exists specifically to allow customers to trial the Outpost without committing GPU infrastructure upfront
  • CPU mode is controlled by an Arbitex-managed flag (not customer-configurable) to prevent accidental production CPU deployment
  • Fail-closed behavior on GPU unavailability: requests are blocked (not passed through unscanned)
  • Outpost trial documentation must clearly state CPU mode limitations (latency, throughput)
  • Inference fail-closed is a hard requirement — no silent degradation to unscanned traffic
  • GPU pool uses Deallocate mode (not Delete) for 2–4 minute restart vs. 8–25 minute cold provisioning

ADR-014 — 3-tier certificate authority: offline YubiHSM root

Section titled “ADR-014 — 3-tier certificate authority: offline YubiHSM root”

Status: Locked (2026-03-09) Decision date: 2026-03-09

Arbitex needed a PKI for internal service-to-service TLS, Outpost client certificates, and Cloudflare Origin CA. Options: use a public CA for everything, use Azure Key Vault as a single-tier CA, or build a 3-tier hierarchy.

3-tier CA hierarchy:

Root CA (YubiHSM, offline) → Intermediate CA (Azure Key Vault HSM) → Leaf certs (cert-manager)
  • Root CA generated on YubiHSM hardware security module; taken offline after signing the intermediate
  • Intermediate CA lives in Azure Key Vault (separate HSM from the root)
  • Leaf certificates issued by cert-manager using the step-ca issuer against the Key Vault intermediate
  • Single HSM acceptable for pilot; split to two HSMs before first enterprise contract
  • Offline root means a compromise of the online intermediate CA cannot forge a new root — the highest-value key never touches a networked system
  • cert-manager automates leaf certificate issuance and renewal — no manual cert management
  • 3-tier is the industry standard for enterprise PKI; familiar to customers during security reviews
  • YubiHSM provides FIPS 140-2 Level 3 assurance for the root key
  • YubiHSM hardware is a prerequisite for Phase B (Azure deployment) — not a software component
  • Cert-manager renewal failure alert configured (CertificateRequest Failed >1h → alert)
  • Cloudflare Origin CA cert (edge-to-origin) uses a different CA path (Cloudflare’s own CA) — managed separately from the internal 3-tier hierarchy
  • Origin CA is 1-year term (not 15-year) — cert-manager handles auto-renewal; shorter term limits blast radius

ADR-015 — MFA enforcement: staff-configured per org, platform-enforced

Section titled “ADR-015 — MFA enforcement: staff-configured per org, platform-enforced”

Status: Locked (2026-03-09) Decision date: 2026-03-09

Multi-factor authentication policy needed to be configurable per organization while being enforced uniformly by the platform. The question was where policy configuration lives and where enforcement runs.

  • Configuration: Staff admin portal (int.arbitex.ai) — Arbitex staff set MFA policy per org and per plan tier
  • Enforcement: Platform API auth middleware — checks org MFA policy on every login; rejects if unmet
  • Supported MFA types: FIDO2/WebAuthn hardware key, authenticator app (TOTP), SMS, “any MFA”
  • Enterprise default: Enterprise plans can be configured to require FIDO2 hardware key
  • Separating policy configuration (staff plane) from policy enforcement (platform) keeps the customer-facing auth middleware simple — it reads org settings, it doesn’t manage them
  • Staff plane controls MFA policy as an operational lever (e.g., enforce FIDO for healthcare customers)
  • Putting configuration in the staff plane prevents customers from weakening their own MFA requirements once set by their contract
  • Platform-side enforcement means no client-side bypass is possible
  • MFA enforcement is an Epic L deliverable (sprint L3) — not yet built
  • Platform auth middleware needs to call into org_settings for MFA policy check on login
  • MFA enrollment prompts (platform UX) are in scope for L3 T16
  • Staff portal configures MFA types; enforcement runs on every login (no session caching of MFA state)

ADR-016 — SIEM native format: OCSF over raw JSON for Outpost direct sink

Section titled “ADR-016 — SIEM native format: OCSF over raw JSON for Outpost direct sink”

Status: Locked (2026-03-12) Decision date: 2026-03-12

The Outpost SIEM direct sink (outpost-0022) needed to choose an event format when delivering audit events to customer SIEM endpoints (Splunk HEC, Microsoft Sentinel DCR, Elasticsearch). Two options were evaluated: raw JSON (the existing Outpost audit log format) and OCSF (Open Cybersecurity Schema Framework) — the same format used by the Platform SIEM connectors.

Use OCSF format for all Outpost direct sink deliveries, with sourcetype: "arbitex:ocsf" for Splunk HEC.

  • OCSF is already the Platform wire format for Splunk, Sentinel, and Elastic connectors — customers maintaining both Platform and Outpost sinks see a consistent schema.
  • OCSF is a typed, machine-parseable schema; raw JSON requires per-customer field extractions in each SIEM.
  • The arbitex:ocsf sourcetype provides a clear namespace in Splunk; customers can write detection rules against a stable field set.
  • OCSF is backed by AWS, Splunk, IBM, and others — not a proprietary format that could create long-term vendor lock-in.
  • Customers running both the Platform SIEM relay and the Outpost direct sink to the same SIEM get identical schema from both paths — no separate field extraction configuration.
  • The sourcetype: "arbitex:ocsf" Splunk value is fixed at the Outpost; customers must create this sourcetype if it does not already exist.
  • The prior Outpost SIEM direct sink (outpost-0008/0015 syslog/CEF implementation) used raw JSON/CEF — customers migrating to the outpost-0022 implementation must update their SIEM parsing rules.

ADR-017 — Budget enforcement: local-first (Outpost SQLite) over SaaS-pushed spend

Section titled “ADR-017 — Budget enforcement: local-first (Outpost SQLite) over SaaS-pushed spend”

Status: Locked (2026-03-12) Decision date: 2026-03-12

Two approaches to Outpost budget cap enforcement were evaluated:

  1. SaaS-pushed spend (legacy budget.spent_usd in policy bundle): The Platform tracks usage centrally and pushes the current spend figure into the policy bundle on each sync. The Outpost compares the pushed value against caps.
  2. Local-first (outpost-0017 budget_config + local UsageTracker): The Outpost tracks usage locally in SQLite and evaluates caps against locally-accumulated totals.

Both systems coexist in the current codebase; this ADR locks the local-first approach as the authoritative enforcement model.

Local-first budget enforcement (BudgetCapEnforcer + UsageTracker) is the authoritative enforcement model. The SaaS-pushed spend path is retained for legacy compatibility but is not the source of truth.

  • Policy sync runs every 60 seconds. Under the SaaS-pushed model, an Outpost can accumulate up to 60 seconds of spend above the cap before the bundle refreshes with updated totals — creating a ~60-second enforcement gap.
  • During network partition, the bundle is served from cache. SaaS-pushed spend becomes stale immediately; local SQLite totals remain current.
  • Local enforcement eliminates the round-trip latency for enforcement decisions — no blocking call to Platform needed.
  • SQLite WAL mode provides thread-safe, durable local storage for usage counters without requiring a separate database service.
  • Caps are evaluated against the Outpost’s local usage totals, which may differ from Platform-side totals (due to multi-outpost deployments, provider billing vs. token-based estimation, etc.). Monitor both Outpost and Platform usage dashboards in production.
  • The local UsageTracker uses SQLite at USAGE_DB_PATH. Set to empty string to disable local tracking; enforcement becomes a no-op.
  • Hourly period rollover relies on the Outpost running at the rollover boundary. If stopped at rollover, stale totals persist until the next request.

ADR-018 — PKCE public client support: S256-only, no implicit flow

Section titled “ADR-018 — PKCE public client support: S256-only, no implicit flow”

Status: Locked (2026-03-12) Decision date: 2026-03-12

Platform platform-0044 added OAuth 2.0 PKCE support (RFC 7636) to enable authorization_code grants for public clients (SPAs, CLIs, mobile apps). Two decisions were made: which code challenge methods to support, and whether to allow the implicit grant flow as an alternative.

  • Support S256 only for PKCE code challenge method. plain is rejected.
  • No implicit flow. Authorization code + PKCE is the only supported flow for public clients.
  • OAuthClient.client_type distinguishes confidential (client secret required) from public (PKCE required, no secret).
  • plain challenges offer no meaningful security benefit over the implicit flow — they are trivially interceptable. S256 provides the hash-based protection that is the entire point of PKCE.
  • The implicit flow is deprecated in OAuth 2.0 Security Best Current Practice (RFC 9700) — it exposes tokens in redirect URI fragments and cannot use refresh tokens.
  • S256-only simplifies the implementation surface: one code challenge algorithm, one authorization code grant path, no implicit edge cases.
  • PKCE code verifiers are stored in Redis with a 10-minute TTL and consumed atomically (GET+DEL) — replay attacks are rejected at the store layer.
  • Existing confidential clients (client_credentials grant) are unaffected — client_type defaults to confidential in migration 063.
  • Public clients must send a code_challenge (S256) in the authorization request. POST /api/oauth/authorize returns 400 if code_challenge is absent for a public client.
  • Arbitex does not issue refresh tokens for public clients in the initial implementation — clients must re-authorize when the access token expires.

ADR-019 — Structured JSON logging: Log Analytics-ready format platform-wide

Section titled “ADR-019 — Structured JSON logging: Log Analytics-ready format platform-wide”

Status: Locked (2026-03-12) Decision date: 2026-03-12

Platform platform-0044 (T566) and platform-0025 (T445) shipped a structured JSON log formatter as a platform-wide logging standard. The decision to standardise on JSON over human-readable plain text required choosing a format and making it the default.

All Platform services emit structured JSON log lines to stdout. The formatter is implemented as a Python logging.Formatter subclass and configured in the application lifespan handler. Log lines include at minimum: timestamp (ISO 8601 UTC), level, logger, message, plus any extra fields passed to the log call.

  • Azure Log Analytics (the target observability backend) ingests structured JSON natively — plain text requires custom parsing pipelines.
  • Structured logs enable field-level querying in Log Analytics (KQL) without regex extraction.
  • Per-request correlation fields (request_id, org_id, user_id) can be included as structured fields rather than embedded in log strings.
  • Machine-parseable logs from day one avoids a migration later when Log Analytics is wired.
  • Human-readable log viewing requires a JSON formatter in the terminal (e.g. kubectl logs | jq). Operations staff must be aware of this.
  • The logging.basicConfig call in test fixtures must be updated to handle JSON format if test output is parsed.
  • Third-party libraries that emit their own log format (Uvicorn access log, SQLAlchemy echo) are not covered by the formatter — their output remains unstructured. Suppress or re-route those loggers as needed.

ADRTitleStatus
ADR-001Policy Engine model: Palo Alto unified pipelineLocked
ADR-002Deployment topology: SaaS + Hybrid OutpostLocked
ADR-003Cloud provider: AzureLocked
ADR-004AKS over Azure Container AppsLocked
ADR-005Edge provider: Cloudflare ProLocked
ADR-006Compliance framing: “designed for”Locked
ADR-007GRC framework: NIST CSF 2.0 + Privacy Framework + AI RMFLocked
ADR-008Audit log model: 90-day buffer; SIEM is record of truthLocked
ADR-009SIEM format: OCSF with Splunk HEC as P0Locked
ADR-010Staff admin plane: int.arbitex.ai NS delegationLocked
ADR-011BYOK scope: Epic HLocked
ADR-012Plan tier keys: five-tier modelLocked
ADR-013DeBERTa inference tiersLocked
ADR-0143-tier CA: offline YubiHSM rootLocked
ADR-015MFA enforcement: staff-configured, platform-enforcedLocked
ADR-016SIEM native format: OCSF over raw JSON for Outpost direct sinkLocked
ADR-017Budget enforcement: local-first (Outpost SQLite) over SaaS-pushed spendLocked
ADR-018PKCE public client support: S256-only, no implicit flowLocked
ADR-019Structured JSON logging: Log Analytics-ready format platform-wideLocked