SaaS Infrastructure Architecture

This page describes the Arbitex SaaS infrastructure as deployed in Azure Kubernetes Service (AKS). It covers the deployment model, network topology, request data flow, storage architecture, observability design, and security boundaries. This is the live production architecture for the Arbitex SaaS offering.

For the full technical deployment reference including Helm chart structure, Dockerfile details, and CI/CD pipeline, see Epic M deployment architecture overview.

Deployment model

Arbitex SaaS runs on Azure Kubernetes Service in a single-region active deployment.

┌────────────────────────────────────────────────────────────────┐
│  Azure Region                                                  │
│                                                                │
│  ┌────────────────────────────────────────────────────────┐   │
│  │  AKS Cluster                                           │   │
│  │                                                        │   │
│  │  ┌──────────────┐  ┌──────────────┐  ┌─────────────┐  │   │
│  │  │  platform-   │  │  platform-   │  │  ner-gpu    │  │   │
│  │  │  api         │  │  frontend    │  │  (GLiNER)   │  │   │
│  │  │  :8000       │  │  :8080       │  │  :8200      │  │   │
│  │  └──────────────┘  └──────────────┘  └─────────────┘  │   │
│  │                                                        │   │
│  │  ┌──────────────┐  ┌──────────────────────────────┐   │   │
│  │  │  deberta-    │  │  GPU Node Pool               │   │   │
│  │  │  validator   │  │  (NER + DeBERTa workloads)   │   │   │
│  │  │  :8201       │  └──────────────────────────────┘   │   │
│  │  └──────────────┘                                      │   │
│  └────────────────────────────────────────────────────────┘   │
│                                                                │
│  ┌─────────────┐  ┌──────────────────┐  ┌──────────────────┐  │
│  │  PostgreSQL │  │  Redis Cache     │  │  Azure Key Vault │  │
│  │  (Flexible) │  │  (sessions/cache)│  │  (secrets/certs) │  │
│  └─────────────┘  └──────────────────┘  └──────────────────┘  │
│                                                                │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │  Azure Files / Blob Storage                             │  │
│  │  (GeoIP MMDB, CredInt bloom filter, object storage)     │  │
│  └─────────────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────────────┘

Services

Service	Replicas	Scaling
`platform-api` (FastAPI)	HPA-managed	Scales on CPU/memory
`platform-frontend` (React/Nginx)	HPA-managed	Scales on requests
`ner-gpu` (GLiNER NER)	GPU node pool	Scale-to-zero capable
`deberta-validator` (DeBERTa NLI)	GPU node pool	Scale-to-zero capable

GPU inference workloads (GLiNER, DeBERTa) run on a dedicated GPU node pool with accelerator=nvidia node selectors. The CPU node pool handles all other workloads.

Multi-region and Phase B

The current production deployment is single-region (Epic M Phase A). Multi-region active-active deployment and automated Azure provisioning are planned under Epic M Phase B.

Network architecture

Perimeter — Cloudflare

All customer-facing traffic enters through Cloudflare before reaching origin:

Internet → Cloudflare Edge → AKS NGINX Ingress → Services

Cloudflare capability	Configuration
WAF	OWASP ruleset + custom AI proxy rules
DDoS protection	Unmetered at network layer
Bot management	Automated traffic filtering
Rate limiting	Per-path limits enforced before origin
Authenticated Origin Pulls	mTLS — cryptographic proof traffic originates from Cloudflare
CDN	Static assets; API traffic passes through with cache-miss

NGINX Ingress

The AKS NGINX Ingress controller handles:

TLS termination for HTTPS traffic
Request body size limits (configurable per route)
SSE (Server-Sent Events) long-poll support for streaming completions
Proxy headers for client IP propagation

Internal network — Calico policies

Inside the AKS cluster, Calico network policies enforce default-deny between all pods. Each service pair that needs to communicate has an explicit allowlist policy.

Default: deny all pod-to-pod traffic

Explicit allowances:
  platform-api → postgresql (port 5432)
  platform-api → redis (port 6379)
  platform-api → ner-gpu (port 8200)
  platform-api → deberta-validator (port 8201)
  platform-api → azure-key-vault (via private endpoint)
  platform-frontend → platform-api (internal proxy)

A compromised pod cannot reach unrelated services.

Private endpoints

All data services are accessible only via Azure Private Endpoints:

PostgreSQL — no public endpoint
Redis — no public endpoint
Azure Key Vault — no public endpoint

Traffic between the AKS cluster and data services never traverses the public internet.

Staff admin plane

Internal Arbitex staff tooling (int.arbitex.ai) is NS-delegated to a private RFC 1918 nameserver. The subtree is unreachable from the public internet (external DNS returns SERVFAIL). Access requires VPN or Cloudflare Zero Trust connector.

Request data flow

A request from client to AI provider passes through the following stages. The audit layer runs at each enforcement point.

Client sends request
    │
    ▼
1. Cloudflare Edge
   - WAF inspection
   - Rate limiting
   - DDoS filtering
    │
    ▼
2. NGINX Ingress
   - TLS termination
   - IP allowlist check (org-configured)
   - Request size enforcement
    │
    ▼
3. Authentication middleware
   - API key validation (SHA-256 hash compare)
   - RS256 JWT validation (if Bearer token present)
   - SAML session validation (for portal requests)
   ✎ Audit: request_received, auth_result
    │
    ▼
4. Policy Engine
   - Policy chain evaluation (first_applicable algorithm)
   - Compliance Bundle rules evaluated
   - Custom org policy rules evaluated
   ✎ Audit: policy_evaluated, rule_matched (if applicable)
    │
    ▼
5. DLP Pipeline (3 tiers)
   - Tier 1: Regex + Luhn (structured PII)
   - Tier 2: GLiNER NER via ner-gpu service
   - Tier 3: DeBERTa NLI via deberta-validator service
   ✎ Audit: dlp_findings
    │
    ▼
6. Enforcement action
   - BLOCK → 400 response returned, no provider call
   - REDACT → prompt modified, continue
   - CANCEL → completion cancelled after initial tokens
   - ALLOW_WITH_OVERRIDE → logged, continue
   - PROMPT → governance challenge interposed
   ✎ Audit: enforcement_action
    │
    ▼
7. Provider Gateway
   - Route selection (latency, cost, fallback chain)
   - Provider API call over TLS
   - Circuit breaker on provider errors
    │
    ▼
8. Response processing
   - DLP scan on completion (response side)
   - Policy evaluation on response
   ✎ Audit: response_received, response_enforcement (if applicable)
    │
    ▼
9. Response to client
   ✎ Audit: request_complete (final entry, closes HMAC chain link)

Each ✎ Audit step produces one or more entries in the HMAC-chained audit log. The chain links audit entries for the same request using the request_id correlation field.

Storage

PostgreSQL (primary store)

PostgreSQL via Azure Database for PostgreSQL Flexible Server.

Use	Tables
User accounts, orgs, groups	`users`, `orgs`, `groups`, `group_members`
API keys (SHA-256 hashed)	`api_keys`
OAuth clients and tokens	`oauth_clients`, `oauth_tokens`
Policy packs, rules, chains	`policy_packs`, `policy_rules`, `policy_chains`
SAML IdP configurations (Fernet-encrypted)	`saml_idp_configs`
SIEM connector configurations (Fernet-encrypted)	`siem_configs`
IP allowlist rules	`org_ip_allowlists`
Audit log entries	`audit_logs`
Usage metering records	`usage_records`, `usage_summaries`

Encryption: AES-256 at storage level (Azure platform-managed). Sensitive configuration fields additionally encrypted with Fernet (AES-128-CBC) at the application layer.

Backup: Azure Flexible Server automated backups with point-in-time recovery (PITR). See DR runbook.

Redis (cache and sessions)

Redis via Azure Cache for Redis.

Use	Key pattern
Distributed session store	`session:<session_id>`
OAuth token cache	`oauth_token:<token_hash>`
IP allowlist cache	`ip_allowlist:<org_id>` — 60s TTL
Policy chain cache	`policy_chain:<org_id>` — 60s TTL
SIEM config cache	`siem_config:<org_id>` — 60s TTL
Rate limiting counters	`rate_limit:<org_id>:<path>`

No sticky sessions required: The Redis session store enables distributed sessions — any pod can handle any request without session affinity. This enables horizontal scaling without session-partition constraints.

Encryption: TLS enforced for all Redis connections. No data persisted to unencrypted disk.

Azure Files / Blob Storage

Dataset	Format	Update cadence
GeoIP database (MaxMind)	MMDB binary	Scheduled refresh
Credential Intelligence bloom filter	Binary	CDN-delivered refresh
Audit export archives	JSON (OCSF)	On-demand / scheduled

Encryption: Azure Storage Service Encryption (AES-256, platform-managed keys).

Azure Key Vault

Stores all secrets in production:

Fernet encryption key (application-layer encryption)
SAML signing certificates
Internal CA intermediate key (for outpost certificate issuance)
Service connection strings and credentials

Key Vault tier: FIPS 140-2 Level 3 HSM-backed. Applications fail to start if Key Vault is unreachable and insecure fallbacks are detected.

Observability

Structured logging

All services emit structured JSON logs with consistent fields:

Field	Description
`timestamp`	ISO 8601
`level`	`debug`, `info`, `warning`, `error`, `critical`
`service`	Service name (`platform-api`, `ner-gpu`, etc.)
`request_id`	UUID — spans all log entries for a single request
`org_id`	Tenant identifier
`event`	Event name
`duration_ms`	Processing time for completed operations

Logs ship to Azure Log Analytics and are available for SIEM correlation. Each request produces log entries across multiple services that correlate via request_id.

Distributed tracing

HTTP requests carry the traceparent header per W3C Trace Context specification. The header is propagated through the full request chain (API → NER GPU → DeBERTa → provider), enabling end-to-end request tracing in compatible observability platforms.

Health checks and circuit breakers

Service	Health endpoint	Circuit breaker
`platform-api`	`GET /healthz`	—
`ner-gpu`	`GET /health`	3 failures / 60s reset
`deberta-validator`	`GET /health`	3 failures / 60s reset
Provider integrations	Per-provider health	Provider health score; automatic fallback

Circuit breakers on the NER and DeBERTa microservices prevent GPU inference failures from blocking the full request pipeline — the DLP pipeline degrades gracefully to lower tiers when GPU services are unavailable.

Security boundaries

Per-tenant isolation

Arbitex enforces tenant isolation at the data layer. There is no shared database across tenants — all tables carry a tenant_id (UUID) column and all queries are filtered by tenant at the service layer. Tenant identity is derived from the authenticated API key or JWT claim; it cannot be overridden by the caller.

Tenant A                    Tenant B
    │                           │
    ▼                           ▼
  org_id=uuid-A              org_id=uuid-B
    │                           │
    ▼                           ▼
All DB queries:             All DB queries:
  WHERE tenant_id=A           WHERE tenant_id=B

Org-scoped tokens

Credential type	Scope
API keys	Single org; carry role (admin/user)
RS256 JWT (M2M)	Single org; carry scopes (api:read, api:write, etc.)
SAML session	Single org; user identity from IdP
Outpost mTLS certificate	Single outpost; issued at registration, revokeable

No cross-tenant token escalation path exists.

Rate limiting tiers

Rate limiting is enforced at two layers:

Layer	Mechanism	Granularity
Cloudflare edge	Per-path rate limits	Global (pre-auth)
Platform API	Sliding window (Redis)	Per org, per endpoint

Plan tier	RPM limit	Burst
Free	60 RPM	Limited
Standard	600 RPM	Standard burst
Premium	3,000 RPM	Premium burst
Enterprise	Configurable	Custom

M2M OAuth clients have an additional rate_limit_tier property (standard / premium / unlimited) independent of the org plan tier.

Non-root containers

All service containers run as non-root users:

platform-api — appuser (UID 1000)
platform-frontend — nginx user (UID 101)
GPU microservices — non-root GPU user

Pod Security Standards restricted profile is enforced on CPU namespaces. baseline profile on the GPU inference namespace (required for CUDA driver access).

Hybrid deployment — Arbitex Outpost

The Arbitex Outpost extends the SaaS enforcement pipeline into your private network.

Your Network                         Arbitex SaaS
    │                                     │
    ▼                                     │
┌──────────────────────────────────┐      │
│  Arbitex Outpost                 │      │
│  ┌────────────┐  ┌────────────┐  │      │
│  │  DLP       │  │  Policy    │  │◄─────┤ policy sync (mTLS)
│  │  Pipeline  │  │  Cache     │  │      │
│  └────────────┘  └────────────┘  │      │
│  ┌────────────┐  ┌────────────┐  │      │
│  │  DeBERTa   │  │  CredInt   │  │      │
│  │  (local)   │  │  Bloom     │  │      │
│  └────────────┘  └────────────┘  │      │
│                                  │      │
│  Audit buffer → SaaS sync (mTLS) │─────►│ audit events
└──────────────────────────────────┘
        │ (to provider, or local-only)
        ▼
    AI Provider

The Outpost runs the same 3-tier DLP pipeline locally. Policy packs sync from the SaaS control plane over mTLS. In air-gap mode, the Outpost operates without SaaS connectivity, using the last synced policy bundle.

See Outpost deployment guide and Air-gap deployment guide.