Architecture Decision Records
Architecture Decision Records (ADRs) document the key decisions made during Arbitex platform design. Each record captures the context, decision, rationale, and consequences of a significant architectural choice.
These decisions are locked and form the stable foundation for platform development. Changes to locked decisions require explicit PO approval and a new ADR.
ADR-001 — Policy Engine model: Palo Alto unified pipeline
Section titled “ADR-001 — Policy Engine model: Palo Alto unified pipeline”Status: Locked (2026-03-08) Decision date: 2026-03-08
Context
Section titled “Context”Arbitex needed a policy evaluation model for AI request governance. Several models were considered: simple rule lists, deny-only filtering, and unified policy frameworks from enterprise security (Palo Alto NGFW model).
The platform needed to handle allow, block, prompt-for-approval, and route-to-alternate actions — not just filtering. Rules needed to compose across multiple policy packs, with configurable precedence.
Decision
Section titled “Decision”Adopt the Palo Alto unified policy pipeline model: a first-match evaluation across an ordered policy chain, with explicit ALLOW, BLOCK, PROMPT, ROUTE_TO, and ALLOW_WITH_OVERRIDE actions. Policy Packs replace the previous “compliance bundles” and “rule lists” concepts under a unified abstraction.
Rationale
Section titled “Rationale”- Palo Alto’s model is well-understood by enterprise security teams — familiar to the buyers Arbitex targets
- First-match with explicit actions is deterministic and auditable; deny-overrides combining algorithm provides a stricter alternative when needed
- Unified abstraction (Policy Pack = any pack, bundle or custom) avoids two separate rule systems
- ROUTE_TO and ALLOW_WITH_OVERRIDE extend the model to cover AI-specific governance patterns without compromising the core evaluation semantics
Consequences
Section titled “Consequences”- Policy Engine naming is a placeholder for easy find-replace when Brand brief finalizes product name
- All existing “compliance bundles” became Policy Packs with
pack_type: "bundle"— fully backward-compatible - PROMPT and ALLOW_WITH_OVERRIDE actions require audit log capture of approver identity and reason (not yet built — Epic J5 scope)
ADR-002 — Deployment topology: SaaS + Hybrid Outpost (no full self-host)
Section titled “ADR-002 — Deployment topology: SaaS + Hybrid Outpost (no full self-host)”Status: Locked (2026-03-07) Decision date: 2026-03-07
Context
Section titled “Context”Customers have varying data residency and network isolation requirements. Three deployment models were evaluated: SaaS only, full self-host, and a hybrid model where the control plane is Arbitex-managed but the data plane runs in the customer environment.
Decision
Section titled “Decision”Support exactly two deployment topologies:
- SaaS — Arbitex-managed control plane and data plane. Customer traffic passes through
api.arbitex.ai. - Hybrid Outpost — Arbitex-managed control plane (SaaS); customer-hosted data plane (Outpost container in customer VPC/network). No full self-host option.
Rationale
Section titled “Rationale”- Full self-host requires distributing the entire platform stack (including proprietary DLP models), which creates IP risk and support complexity
- Hybrid Outpost satisfies data residency requirements: sensitive prompts never leave the customer environment; only audit metadata and configuration reach the Arbitex control plane
- SaaS control plane always being Arbitex-managed ensures policy packs and compliance bundles are consistently maintained
- No full self-host simplifies the licensing and compliance surface area significantly
Consequences
Section titled “Consequences”- Outpost is resilient to control plane outages (caches configuration locally)
- Cloud control plane owns Outpost registration and certificate issuance
- DeBERTa inference in Hybrid Outpost runs on customer infrastructure — GPU requirement documented per tier (SaaS=GPU required; Outpost prod=GPU required; Outpost trial=CPU via control plane flag)
- “Changelog self-host” entry was removed from public documentation — capability does not exist
ADR-003 — Cloud provider: Azure
Section titled “ADR-003 — Cloud provider: Azure”Status: Locked (2026-03-08) Decision date: 2026-03-08
Context
Section titled “Context”Target customers are predominantly in financial services, healthcare, and government — sectors where Azure has the highest market share. Primary pilot prospect is on Azure infrastructure. AWS and GCP were evaluated.
Decision
Section titled “Decision”Azure as the sole cloud provider for Arbitex SaaS production infrastructure.
Rationale
Section titled “Rationale”- Target customers (fin/health/gov) have largest presence on Azure — simplifies data residency, private connectivity (PrivateLink), and regulatory compliance discussions
- Largest known pilot prospect is on Azure stack — alignment reduces friction
- Azure Flexible Server PostgreSQL, Azure Cache for Redis, and Azure Key Vault are mature managed services for the data layer
- GPU capacity (NC4as_T4_v3 with NVIDIA T4) available in East US 2 with 3 availability zones
Consequences
Section titled “Consequences”- East US 2 region locked — 3 AZs, GPU capacity, regulated workload preference
- Epic H (BYOK) and Epic I (Private connectivity) use Azure-native primitives (Key Vault, PrivateLink)
- AWS Bedrock adapter is shipped (provider expansion) but the hosting infrastructure is Azure-only
ADR-004 — AKS over Azure Container Apps
Section titled “ADR-004 — AKS over Azure Container Apps”Status: Locked (2026-03-09) Decision date: 2026-03-09
Context
Section titled “Context”Azure offers two container orchestration services: Azure Kubernetes Service (AKS) and Azure Container Apps (ACA). ACA provides a higher-level abstraction with automatic scaling. AKS requires more operational investment but provides full control.
Decision
Section titled “Decision”AKS Standard tier over Azure Container Apps.
Rationale
Section titled “Rationale”ACA cannot support three capabilities that are non-negotiable for Arbitex:
- GPU node pools — DeBERTa inference requires NVIDIA T4 GPU. ACA has no GPU support.
- Azure Key Vault CSI driver — Required for mounting secrets as volumes (fail-fast startup pattern). ACA does not support CSI drivers.
- Pod-level network policies (Calico) — Required for default-deny isolation between service pairs. ACA uses a shared network model.
AKS Standard tier ($72/month) provides 99.95% API server SLA and control-plane audit logs (required for SOC 2 CC7 logging).
Consequences
Section titled “Consequences”- More operational investment than ACA (Helm charts, node pool management, pod security standards)
- AKS free tier explicitly rejected — no SLA and no control-plane audit logs
- System node pool uses B2s_v2 burstable instance for cost efficiency (~$61/month)
ADR-005 — Edge provider: Cloudflare Pro over Azure Front Door
Section titled “ADR-005 — Edge provider: Cloudflare Pro over Azure Front Door”Status: Locked (2026-03-09) Decision date: 2026-03-09
Context
Section titled “Context”A CDN and DDoS protection layer is required in front of the AKS origin. Azure Front Door and Cloudflare Pro were the primary candidates.
Decision
Section titled “Decision”Cloudflare Pro ($20/month) over Azure Front Door.
Rationale
Section titled “Rationale”| Capability | Cloudflare Pro | Azure Front Door |
|---|---|---|
| DDoS protection | Unmetered (included) | ~$2,944/month (Azure DDoS Protection) |
| WAF | Included ($20/month) | Extra cost |
| DNS management | Included | Azure DNS separate |
| Monthly cost | ~$20 | ~$350+ |
Cloudflare Authenticated Origin Pulls provides cryptographic mTLS from edge to origin — stronger than IP allowlist-only protection. Cloudflare’s global anycast network provides better latency characteristics than Azure Front Door’s POPs for the target customer geography.
Consequences
Section titled “Consequences”- DNS managed in Cloudflare (not Azure DNS) — single pane for DNS + edge config, instant propagation
- Cloudflare Origin CA certificate (1-year, auto-renewed by cert-manager) used for edge-to-origin TLS
- Cloudflare IP allowlist synced automatically as defense-in-depth behind Authenticated Origin Pulls
- All public domains proxied through Cloudflare — no direct IP exposure for origin
ADR-006 — Compliance framing: “designed for” (not “certified”)
Section titled “ADR-006 — Compliance framing: “designed for” (not “certified”)”Status: Locked (2026-03-07) Decision date: 2026-03-07
Context
Section titled “Context”Arbitex is building toward SOC 2 Type II certification but has not yet undergone a formal audit. Sales materials and documentation needed a framing that is accurate without misrepresenting certification status.
Decision
Section titled “Decision”Use “designed for” framing in all customer-facing documentation and sales materials. Never claim SOC 2 Type II certification or equivalent certification status until the audit is complete.
Rationale
Section titled “Rationale”- Misrepresenting certification status creates legal and reputational risk
- “Designed for” accurately describes the architectural intent without making audit claims
- Enterprise security teams understand the distinction — accurate framing builds trust faster than overclaiming
- Compliance Bundles support customer compliance programs; the platform’s own certification status is separate
Consequences
Section titled “Consequences”- All docs use “designed for” or “supports” language, not “certified” or “compliant”
- Security overview page includes explicit disclaimer about current certification status
- Sales team briefed: do not represent Arbitex as SOC 2 certified
ADR-007 — GRC framework: NIST CSF 2.0 + Privacy Framework + AI RMF
Section titled “ADR-007 — GRC framework: NIST CSF 2.0 + Privacy Framework + AI RMF”Status: Locked (2026-03-07) Decision date: 2026-03-07
Context
Section titled “Context”Arbitex needed a governance, risk, and compliance framework to structure internal security controls and communicate posture to enterprise customers. Multiple frameworks were evaluated: NIST CSF, ISO 27001, SOC 2, CIS Controls.
Decision
Section titled “Decision”Primary: NIST Cybersecurity Framework 2.0 + NIST Privacy Framework 1.0 + NIST AI Risk Management Framework 1.0
Crosswalk: ISO 27001/27002/27701 (for EU customers and SOC 2 readiness)
Principle: Confidentiality ≠ Privacy (CIAP — these are related but distinct properties with separate control requirements)
Rationale
Section titled “Rationale”- NIST frameworks are US government standards — highest credibility with federal and regulated-industry customers
- AI RMF is purpose-built for AI system risk, which is the core of what Arbitex is
- NIST CSF 2.0 added the Govern function, which maps directly to the Policy Engine
- ISO crosswalk satisfies EU customers without requiring a separate framework implementation
- NIST frameworks are freely available (no licensing) and widely documented
Consequences
Section titled “Consequences”- Security documentation structured around NIST CSF functions (Govern, Identify, Protect, Detect, Respond, Recover)
- Compliance Bundles map framework-specific controls to AI system risks (AI RMF Measure function)
- Privacy controls (NIST Privacy Framework) handled separately from confidentiality controls (NIST CSF) — CCPA/GDPR bundles address the privacy dimension
ADR-008 — Audit log model: 90-day tamper-evident buffer; SIEM is record of truth
Section titled “ADR-008 — Audit log model: 90-day tamper-evident buffer; SIEM is record of truth”Status: Locked (2026-03-08) Decision date: 2026-03-08
Context
Section titled “Context”Enterprise customers require audit trails for compliance. Questions to resolve: how long does Arbitex retain audit data? Who owns the record of truth? How is tamper-evidence implemented?
Decision
Section titled “Decision”- Arbitex retains: 90-day tamper-evident buffer (active) + 2-year archive tier in Azure Log Analytics
- Record of truth: Customer’s SIEM — Arbitex is not the system of record for long-term audit retention
- Tamper-evidence: HMAC chain with key versioning (
hmac_key_idfield on each audit entry)
Rationale
Section titled “Rationale”- Keeping Arbitex as the long-term record of record creates regulatory liability and increases COGS
- Enterprise customers already have SIEM infrastructure; OCSF format export integrates naturally
- 90-day buffer covers the most common compliance investigation windows
- HMAC chain provides the tamper-evidence property that most compliance frameworks require for audit logs, without requiring WORM storage
Consequences
Section titled “Consequences”- SIEM integration guide is a P0 deliverable (shipped — see SIEM integration)
- Customers must configure their SIEM retention to meet their specific framework requirements (BSA: 5 years; SEC 17a-4: 3–6 years; HIPAA: 6 years)
hmac_key_idfield in audit entries supports key rotation without breaking chain validation
ADR-009 — SIEM format: OCSF with Splunk HEC as P0
Section titled “ADR-009 — SIEM format: OCSF with Splunk HEC as P0”Status: Locked (2026-03-08) Decision date: 2026-03-08
Context
Section titled “Context”SIEM export format and initial connector priority needed to be determined. Options: proprietary format, CEF, LEEF, OCSF, native connector per SIEM.
Decision
Section titled “Decision”- Wire format: OCSF (Open Cybersecurity Schema Framework) — vendor-neutral, structured
- P0 connector: Splunk HTTP Event Collector (HEC)
- P0 connector (Sentinel): Azure Sentinel
- Additional connectors: 5 stubs for Elastic, Datadog, AWS Security Hub, Google Chronicle, IBM QRadar
Rationale
Section titled “Rationale”- OCSF is the emerging industry standard for security telemetry — backed by AWS, Splunk, IBM, and others
- OCSF over CEF/LEEF: machine-parseable schema with typed fields vs. semistructured key-value
- Splunk has dominant market share in enterprise SIEM among financial services customers (primary target)
- Azure Sentinel P0 because Arbitex runs on Azure — natural alignment for Azure-native customers
- Connector framework design allows adding new targets without changing the event schema
Consequences
Section titled “Consequences”- OCSF schema version and class IDs must be pinned and documented for customer SIEM rule writing
- Splunk HEC + Sentinel shipped in platform S13 (205 tests)
- Remaining 5 connectors are stubs — customers can implement via webhook if not yet native
ADR-010 — Staff admin plane isolation: int.arbitex.ai with NS delegation
Section titled “ADR-010 — Staff admin plane isolation: int.arbitex.ai with NS delegation”Status: Locked (2026-03-09) Decision date: 2026-03-09
Context
Section titled “Context”Arbitex staff needed tooling for customer management (plan assignment, rate limits, MFA policy, billing views). This tooling handles cross-tenant sensitive operations and must be isolated from the customer-facing control plane.
Decision
Section titled “Decision”- Domain:
int.arbitex.ai— subdomain chosen to avoid automated enumeration (notstaff/admin/internal) - DNS: NS delegation from public DNS to a private RFC 1918 nameserver — entire
int.arbitex.aisubtree unreachable from public internet - Access: VPN or Cloudflare Zero Trust connector required
- Authentication: FIDO2/WebAuthn hardware key mandatory; no SMS/TOTP fallback
- Model: Cross-tenant access via
acting_org_idJWT claim — no session switching; both staff identity and target org captured in audit trail - Repo: Dedicated
arbitex-staffrepository — independent deploy pipeline
Rationale
Section titled “Rationale”- NS delegation to RFC 1918 provides stronger isolation than IP allowlisting — external resolvers return SERVFAIL, no subdomain enumeration possible
- FIDO2 is mandatory because the staff portal is the highest-privilege surface in the system (blast radius: all customers)
acting_org_idmodel ensures audit trail captures the full context of every cross-tenant action- Dedicated repo enforces clean separation of concerns and independent security review cycle
Consequences
Section titled “Consequences”int.arbitex.aiis an Epic L deliverable — not yet built- Requires VPN or Cloudflare Zero Trust connector as prerequisite (Epic L, sprint L4)
- Staff auth stack dogfoods the same auth infrastructure as customers — proves platform auth in production use
ADR-011 — BYOK scope: Epic H (Azure Key Vault, customer-managed keys)
Section titled “ADR-011 — BYOK scope: Epic H (Azure Key Vault, customer-managed keys)”Status: Locked (2026-03-08) Decision date: 2026-03-08
Context
Section titled “Context”Enterprise customers in regulated industries often require customer-managed encryption keys (BYOK/CMEK) for data at rest. This was evaluated against Arbitex-managed keys (current state) and a deferred BYOK model.
Decision
Section titled “Decision”BYOK is scoped under Epic H and blocked on Azure subscription provisioning (C005). The key architecture when implemented:
- Customer provides their own Azure Key Vault key
- Arbitex retrieves and caches the Data Encryption Key (DEK) in Redis db=2 (reserved for this purpose)
- Arbitex wraps/unwraps database encryption keys using the customer KEK
Current state: all data encrypted with Arbitex-managed keys. BYOK not available.
Rationale
Section titled “Rationale”- BYOK is a hard requirement for some regulated customers (certain HIPAA Business Associates, GLBA-covered entities with specific contractual requirements)
- Deferring to Epic H allows the platform to launch and acquire initial customers without BYOK complexity
- Redis db=2 is reserved in the current architecture to avoid a migration when BYOK ships
- Azure Key Vault HSM-backed keys provide the customer-controlled key security property
Consequences
Section titled “Consequences”byok_scopedecision documented in public-facing docs to be explicit with procurement teams about current vs. future state- Epic H (~70 pts) blocked on C005 (Azure subscription provisioning)
- db=2 Redis slot reserved — do not use for other purposes
ADR-012 — Plan tier keys: five-tier model with enterprise_outpost separation
Section titled “ADR-012 — Plan tier keys: five-tier model with enterprise_outpost separation”Status: Locked (2026-03-08) Decision date: 2026-03-08
Context
Section titled “Context”Arbitex needed a plan tier model for billing and entitlement logic. Initial designs had 3–4 tiers. The question was whether to model SaaS and Hybrid Outpost enterprise separately or as one tier with a deployment flag.
Decision
Section titled “Decision”Five canonical plan tier keys:
| Key | Description |
|---|---|
devfree_saas | Developer free tier — 100k requests/month |
devpro_saas | Developer Pro — 1M requests/month (“Early Access” during beta) |
team_saas | Team tier |
enterprise_saas | Enterprise SaaS |
enterprise_outpost | Enterprise Hybrid Outpost (separate key — Outpost carries additional entitlements) |
Enterprise limits stored in a separate enterprise_entitlements table.
Rationale
Section titled “Rationale”- Separate
enterprise_outpostkey avoids a flag-based system (enterprise + outpost_enabled=true) that creates ambiguous entitlement logic - Explicit keys in entitlement checks are more readable and auditable than boolean combinations
- Dev plan limits (100k free, 1M paid) were explicitly locked to prevent drift
- “Early Access” framing for Dev Pro (not “Coming Soon”) — product is available, just pre-GA
Consequences
Section titled “Consequences”- All code and entitlement checks must use these exact keys — no ad-hoc strings
devpro_saasmarketed as “Early Access” until GAenterprise_entitlementstable exists as separate model — no inline JSON entitlement blobs on the plan record
ADR-013 — DeBERTa inference tiers: GPU required for SaaS; CPU allowed for Outpost trial
Section titled “ADR-013 — DeBERTa inference tiers: GPU required for SaaS; CPU allowed for Outpost trial”Status: Locked (2026-03-09) Decision date: 2026-03-09
Context
Section titled “Context”DeBERTa-based DLP inference requires significant compute. The question was whether to allow CPU inference as a fallback for any deployment configuration.
Decision
Section titled “Decision”| Deployment | Inference hardware | Controlled by |
|---|---|---|
| SaaS (Arbitex-hosted) | GPU required | Infrastructure (always NC4as_T4_v3) |
| Outpost — production | GPU required | Documentation requirement |
| Outpost — trial | CPU allowed | Arbitex control plane flag (not customer-configurable) |
Rationale
Section titled “Rationale”- DeBERTa on CPU is too slow for production request latency targets — GPU is the only viable option for SaaS and production Outpost
- CPU mode exists specifically to allow customers to trial the Outpost without committing GPU infrastructure upfront
- CPU mode is controlled by an Arbitex-managed flag (not customer-configurable) to prevent accidental production CPU deployment
- Fail-closed behavior on GPU unavailability: requests are blocked (not passed through unscanned)
Consequences
Section titled “Consequences”- Outpost trial documentation must clearly state CPU mode limitations (latency, throughput)
- Inference fail-closed is a hard requirement — no silent degradation to unscanned traffic
- GPU pool uses Deallocate mode (not Delete) for 2–4 minute restart vs. 8–25 minute cold provisioning
ADR-014 — 3-tier certificate authority: offline YubiHSM root
Section titled “ADR-014 — 3-tier certificate authority: offline YubiHSM root”Status: Locked (2026-03-09) Decision date: 2026-03-09
Context
Section titled “Context”Arbitex needed a PKI for internal service-to-service TLS, Outpost client certificates, and Cloudflare Origin CA. Options: use a public CA for everything, use Azure Key Vault as a single-tier CA, or build a 3-tier hierarchy.
Decision
Section titled “Decision”3-tier CA hierarchy:
Root CA (YubiHSM, offline) → Intermediate CA (Azure Key Vault HSM) → Leaf certs (cert-manager)- Root CA generated on YubiHSM hardware security module; taken offline after signing the intermediate
- Intermediate CA lives in Azure Key Vault (separate HSM from the root)
- Leaf certificates issued by cert-manager using the step-ca issuer against the Key Vault intermediate
- Single HSM acceptable for pilot; split to two HSMs before first enterprise contract
Rationale
Section titled “Rationale”- Offline root means a compromise of the online intermediate CA cannot forge a new root — the highest-value key never touches a networked system
- cert-manager automates leaf certificate issuance and renewal — no manual cert management
- 3-tier is the industry standard for enterprise PKI; familiar to customers during security reviews
- YubiHSM provides FIPS 140-2 Level 3 assurance for the root key
Consequences
Section titled “Consequences”- YubiHSM hardware is a prerequisite for Phase B (Azure deployment) — not a software component
- Cert-manager renewal failure alert configured (CertificateRequest Failed >1h → alert)
- Cloudflare Origin CA cert (edge-to-origin) uses a different CA path (Cloudflare’s own CA) — managed separately from the internal 3-tier hierarchy
- Origin CA is 1-year term (not 15-year) — cert-manager handles auto-renewal; shorter term limits blast radius
ADR-015 — MFA enforcement: staff-configured per org, platform-enforced
Section titled “ADR-015 — MFA enforcement: staff-configured per org, platform-enforced”Status: Locked (2026-03-09) Decision date: 2026-03-09
Context
Section titled “Context”Multi-factor authentication policy needed to be configurable per organization while being enforced uniformly by the platform. The question was where policy configuration lives and where enforcement runs.
Decision
Section titled “Decision”- Configuration: Staff admin portal (
int.arbitex.ai) — Arbitex staff set MFA policy per org and per plan tier - Enforcement: Platform API auth middleware — checks org MFA policy on every login; rejects if unmet
- Supported MFA types: FIDO2/WebAuthn hardware key, authenticator app (TOTP), SMS, “any MFA”
- Enterprise default: Enterprise plans can be configured to require FIDO2 hardware key
Rationale
Section titled “Rationale”- Separating policy configuration (staff plane) from policy enforcement (platform) keeps the customer-facing auth middleware simple — it reads org settings, it doesn’t manage them
- Staff plane controls MFA policy as an operational lever (e.g., enforce FIDO for healthcare customers)
- Putting configuration in the staff plane prevents customers from weakening their own MFA requirements once set by their contract
- Platform-side enforcement means no client-side bypass is possible
Consequences
Section titled “Consequences”- MFA enforcement is an Epic L deliverable (sprint L3) — not yet built
- Platform auth middleware needs to call into
org_settingsfor MFA policy check on login - MFA enrollment prompts (platform UX) are in scope for L3 T16
- Staff portal configures MFA types; enforcement runs on every login (no session caching of MFA state)
ADR-016 — SIEM native format: OCSF over raw JSON for Outpost direct sink
Section titled “ADR-016 — SIEM native format: OCSF over raw JSON for Outpost direct sink”Status: Locked (2026-03-12) Decision date: 2026-03-12
Context
Section titled “Context”The Outpost SIEM direct sink (outpost-0022) needed to choose an event format when delivering audit events to customer SIEM endpoints (Splunk HEC, Microsoft Sentinel DCR, Elasticsearch). Two options were evaluated: raw JSON (the existing Outpost audit log format) and OCSF (Open Cybersecurity Schema Framework) — the same format used by the Platform SIEM connectors.
Decision
Section titled “Decision”Use OCSF format for all Outpost direct sink deliveries, with sourcetype: "arbitex:ocsf" for Splunk HEC.
Rationale
Section titled “Rationale”- OCSF is already the Platform wire format for Splunk, Sentinel, and Elastic connectors — customers maintaining both Platform and Outpost sinks see a consistent schema.
- OCSF is a typed, machine-parseable schema; raw JSON requires per-customer field extractions in each SIEM.
- The
arbitex:ocsfsourcetype provides a clear namespace in Splunk; customers can write detection rules against a stable field set. - OCSF is backed by AWS, Splunk, IBM, and others — not a proprietary format that could create long-term vendor lock-in.
Consequences
Section titled “Consequences”- Customers running both the Platform SIEM relay and the Outpost direct sink to the same SIEM get identical schema from both paths — no separate field extraction configuration.
- The
sourcetype: "arbitex:ocsf"Splunk value is fixed at the Outpost; customers must create this sourcetype if it does not already exist. - The prior Outpost SIEM direct sink (outpost-0008/0015 syslog/CEF implementation) used raw JSON/CEF — customers migrating to the outpost-0022 implementation must update their SIEM parsing rules.
ADR-017 — Budget enforcement: local-first (Outpost SQLite) over SaaS-pushed spend
Section titled “ADR-017 — Budget enforcement: local-first (Outpost SQLite) over SaaS-pushed spend”Status: Locked (2026-03-12) Decision date: 2026-03-12
Context
Section titled “Context”Two approaches to Outpost budget cap enforcement were evaluated:
- SaaS-pushed spend (legacy
budget.spent_usdin policy bundle): The Platform tracks usage centrally and pushes the current spend figure into the policy bundle on each sync. The Outpost compares the pushed value against caps. - Local-first (outpost-0017
budget_config+ localUsageTracker): The Outpost tracks usage locally in SQLite and evaluates caps against locally-accumulated totals.
Both systems coexist in the current codebase; this ADR locks the local-first approach as the authoritative enforcement model.
Decision
Section titled “Decision”Local-first budget enforcement (BudgetCapEnforcer + UsageTracker) is the authoritative enforcement model. The SaaS-pushed spend path is retained for legacy compatibility but is not the source of truth.
Rationale
Section titled “Rationale”- Policy sync runs every 60 seconds. Under the SaaS-pushed model, an Outpost can accumulate up to 60 seconds of spend above the cap before the bundle refreshes with updated totals — creating a ~60-second enforcement gap.
- During network partition, the bundle is served from cache. SaaS-pushed spend becomes stale immediately; local SQLite totals remain current.
- Local enforcement eliminates the round-trip latency for enforcement decisions — no blocking call to Platform needed.
- SQLite WAL mode provides thread-safe, durable local storage for usage counters without requiring a separate database service.
Consequences
Section titled “Consequences”- Caps are evaluated against the Outpost’s local usage totals, which may differ from Platform-side totals (due to multi-outpost deployments, provider billing vs. token-based estimation, etc.). Monitor both Outpost and Platform usage dashboards in production.
- The local
UsageTrackeruses SQLite atUSAGE_DB_PATH. Set to empty string to disable local tracking; enforcement becomes a no-op. - Hourly period rollover relies on the Outpost running at the rollover boundary. If stopped at rollover, stale totals persist until the next request.
ADR-018 — PKCE public client support: S256-only, no implicit flow
Section titled “ADR-018 — PKCE public client support: S256-only, no implicit flow”Status: Locked (2026-03-12) Decision date: 2026-03-12
Context
Section titled “Context”Platform platform-0044 added OAuth 2.0 PKCE support (RFC 7636) to enable authorization_code grants for public clients (SPAs, CLIs, mobile apps). Two decisions were made: which code challenge methods to support, and whether to allow the implicit grant flow as an alternative.
Decision
Section titled “Decision”- Support S256 only for PKCE code challenge method.
plainis rejected. - No implicit flow. Authorization code + PKCE is the only supported flow for public clients.
OAuthClient.client_typedistinguishesconfidential(client secret required) frompublic(PKCE required, no secret).
Rationale
Section titled “Rationale”plainchallenges offer no meaningful security benefit over the implicit flow — they are trivially interceptable. S256 provides the hash-based protection that is the entire point of PKCE.- The implicit flow is deprecated in OAuth 2.0 Security Best Current Practice (RFC 9700) — it exposes tokens in redirect URI fragments and cannot use refresh tokens.
- S256-only simplifies the implementation surface: one code challenge algorithm, one authorization code grant path, no implicit edge cases.
- PKCE code verifiers are stored in Redis with a 10-minute TTL and consumed atomically (
GET+DEL) — replay attacks are rejected at the store layer.
Consequences
Section titled “Consequences”- Existing confidential clients (client_credentials grant) are unaffected —
client_typedefaults toconfidentialin migration 063. - Public clients must send a
code_challenge(S256) in the authorization request.POST /api/oauth/authorizereturns 400 ifcode_challengeis absent for a public client. - Arbitex does not issue refresh tokens for public clients in the initial implementation — clients must re-authorize when the access token expires.
ADR-019 — Structured JSON logging: Log Analytics-ready format platform-wide
Section titled “ADR-019 — Structured JSON logging: Log Analytics-ready format platform-wide”Status: Locked (2026-03-12) Decision date: 2026-03-12
Context
Section titled “Context”Platform platform-0044 (T566) and platform-0025 (T445) shipped a structured JSON log formatter as a platform-wide logging standard. The decision to standardise on JSON over human-readable plain text required choosing a format and making it the default.
Decision
Section titled “Decision”All Platform services emit structured JSON log lines to stdout. The formatter is implemented as a Python logging.Formatter subclass and configured in the application lifespan handler. Log lines include at minimum: timestamp (ISO 8601 UTC), level, logger, message, plus any extra fields passed to the log call.
Rationale
Section titled “Rationale”- Azure Log Analytics (the target observability backend) ingests structured JSON natively — plain text requires custom parsing pipelines.
- Structured logs enable field-level querying in Log Analytics (KQL) without regex extraction.
- Per-request correlation fields (
request_id,org_id,user_id) can be included as structured fields rather than embedded in log strings. - Machine-parseable logs from day one avoids a migration later when Log Analytics is wired.
Consequences
Section titled “Consequences”- Human-readable log viewing requires a JSON formatter in the terminal (e.g.
kubectl logs | jq). Operations staff must be aware of this. - The
logging.basicConfigcall in test fixtures must be updated to handle JSON format if test output is parsed. - Third-party libraries that emit their own log format (Uvicorn access log, SQLAlchemy echo) are not covered by the formatter — their output remains unstructured. Suppress or re-route those loggers as needed.
| ADR | Title | Status |
|---|---|---|
| ADR-001 | Policy Engine model: Palo Alto unified pipeline | Locked |
| ADR-002 | Deployment topology: SaaS + Hybrid Outpost | Locked |
| ADR-003 | Cloud provider: Azure | Locked |
| ADR-004 | AKS over Azure Container Apps | Locked |
| ADR-005 | Edge provider: Cloudflare Pro | Locked |
| ADR-006 | Compliance framing: “designed for” | Locked |
| ADR-007 | GRC framework: NIST CSF 2.0 + Privacy Framework + AI RMF | Locked |
| ADR-008 | Audit log model: 90-day buffer; SIEM is record of truth | Locked |
| ADR-009 | SIEM format: OCSF with Splunk HEC as P0 | Locked |
| ADR-010 | Staff admin plane: int.arbitex.ai NS delegation | Locked |
| ADR-011 | BYOK scope: Epic H | Locked |
| ADR-012 | Plan tier keys: five-tier model | Locked |
| ADR-013 | DeBERTa inference tiers | Locked |
| ADR-014 | 3-tier CA: offline YubiHSM root | Locked |
| ADR-015 | MFA enforcement: staff-configured, platform-enforced | Locked |
| ADR-016 | SIEM native format: OCSF over raw JSON for Outpost direct sink | Locked |
| ADR-017 | Budget enforcement: local-first (Outpost SQLite) over SaaS-pushed spend | Locked |
| ADR-018 | PKCE public client support: S256-only, no implicit flow | Locked |
| ADR-019 | Structured JSON logging: Log Analytics-ready format platform-wide | Locked |