Skip to content

SaaS Infrastructure Architecture

This page describes the Arbitex SaaS infrastructure as deployed in Azure Kubernetes Service (AKS). It covers the deployment model, network topology, request data flow, storage architecture, observability design, and security boundaries. This is the live production architecture for the Arbitex SaaS offering.

For the full technical deployment reference including Helm chart structure, Dockerfile details, and CI/CD pipeline, see Epic M deployment architecture overview.


Arbitex SaaS runs on Azure Kubernetes Service in a single-region active deployment.

┌────────────────────────────────────────────────────────────────┐
│ Azure Region │
│ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ AKS Cluster │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌─────────────┐ │ │
│ │ │ platform- │ │ platform- │ │ ner-gpu │ │ │
│ │ │ api │ │ frontend │ │ (GLiNER) │ │ │
│ │ │ :8000 │ │ :8080 │ │ :8200 │ │ │
│ │ └──────────────┘ └──────────────┘ └─────────────┘ │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────────────────────┐ │ │
│ │ │ deberta- │ │ GPU Node Pool │ │ │
│ │ │ validator │ │ (NER + DeBERTa workloads) │ │ │
│ │ │ :8201 │ └──────────────────────────────┘ │ │
│ │ └──────────────┘ │ │
│ └────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
│ │ PostgreSQL │ │ Redis Cache │ │ Azure Key Vault │ │
│ │ (Flexible) │ │ (sessions/cache)│ │ (secrets/certs) │ │
│ └─────────────┘ └──────────────────┘ └──────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Azure Files / Blob Storage │ │
│ │ (GeoIP MMDB, CredInt bloom filter, object storage) │ │
│ └─────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────┘
ServiceReplicasScaling
platform-api (FastAPI)HPA-managedScales on CPU/memory
platform-frontend (React/Nginx)HPA-managedScales on requests
ner-gpu (GLiNER NER)GPU node poolScale-to-zero capable
deberta-validator (DeBERTa NLI)GPU node poolScale-to-zero capable

GPU inference workloads (GLiNER, DeBERTa) run on a dedicated GPU node pool with accelerator=nvidia node selectors. The CPU node pool handles all other workloads.

The current production deployment is single-region (Epic M Phase A). Multi-region active-active deployment and automated Azure provisioning are planned under Epic M Phase B.


All customer-facing traffic enters through Cloudflare before reaching origin:

Internet → Cloudflare Edge → AKS NGINX Ingress → Services
Cloudflare capabilityConfiguration
WAFOWASP ruleset + custom AI proxy rules
DDoS protectionUnmetered at network layer
Bot managementAutomated traffic filtering
Rate limitingPer-path limits enforced before origin
Authenticated Origin PullsmTLS — cryptographic proof traffic originates from Cloudflare
CDNStatic assets; API traffic passes through with cache-miss

The AKS NGINX Ingress controller handles:

  • TLS termination for HTTPS traffic
  • Request body size limits (configurable per route)
  • SSE (Server-Sent Events) long-poll support for streaming completions
  • Proxy headers for client IP propagation

Inside the AKS cluster, Calico network policies enforce default-deny between all pods. Each service pair that needs to communicate has an explicit allowlist policy.

Default: deny all pod-to-pod traffic
Explicit allowances:
platform-api → postgresql (port 5432)
platform-api → redis (port 6379)
platform-api → ner-gpu (port 8200)
platform-api → deberta-validator (port 8201)
platform-api → azure-key-vault (via private endpoint)
platform-frontend → platform-api (internal proxy)

A compromised pod cannot reach unrelated services.

All data services are accessible only via Azure Private Endpoints:

  • PostgreSQL — no public endpoint
  • Redis — no public endpoint
  • Azure Key Vault — no public endpoint

Traffic between the AKS cluster and data services never traverses the public internet.

Internal Arbitex staff tooling (int.arbitex.ai) is NS-delegated to a private RFC 1918 nameserver. The subtree is unreachable from the public internet (external DNS returns SERVFAIL). Access requires VPN or Cloudflare Zero Trust connector.


A request from client to AI provider passes through the following stages. The audit layer runs at each enforcement point.

Client sends request
1. Cloudflare Edge
- WAF inspection
- Rate limiting
- DDoS filtering
2. NGINX Ingress
- TLS termination
- IP allowlist check (org-configured)
- Request size enforcement
3. Authentication middleware
- API key validation (SHA-256 hash compare)
- RS256 JWT validation (if Bearer token present)
- SAML session validation (for portal requests)
✎ Audit: request_received, auth_result
4. Policy Engine
- Policy chain evaluation (first_applicable algorithm)
- Compliance Bundle rules evaluated
- Custom org policy rules evaluated
✎ Audit: policy_evaluated, rule_matched (if applicable)
5. DLP Pipeline (3 tiers)
- Tier 1: Regex + Luhn (structured PII)
- Tier 2: GLiNER NER via ner-gpu service
- Tier 3: DeBERTa NLI via deberta-validator service
✎ Audit: dlp_findings
6. Enforcement action
- BLOCK → 400 response returned, no provider call
- REDACT → prompt modified, continue
- CANCEL → completion cancelled after initial tokens
- ALLOW_WITH_OVERRIDE → logged, continue
- PROMPT → governance challenge interposed
✎ Audit: enforcement_action
7. Provider Gateway
- Route selection (latency, cost, fallback chain)
- Provider API call over TLS
- Circuit breaker on provider errors
8. Response processing
- DLP scan on completion (response side)
- Policy evaluation on response
✎ Audit: response_received, response_enforcement (if applicable)
9. Response to client
✎ Audit: request_complete (final entry, closes HMAC chain link)

Each ✎ Audit step produces one or more entries in the HMAC-chained audit log. The chain links audit entries for the same request using the request_id correlation field.


PostgreSQL via Azure Database for PostgreSQL Flexible Server.

UseTables
User accounts, orgs, groupsusers, orgs, groups, group_members
API keys (SHA-256 hashed)api_keys
OAuth clients and tokensoauth_clients, oauth_tokens
Policy packs, rules, chainspolicy_packs, policy_rules, policy_chains
SAML IdP configurations (Fernet-encrypted)saml_idp_configs
SIEM connector configurations (Fernet-encrypted)siem_configs
IP allowlist rulesorg_ip_allowlists
Audit log entriesaudit_logs
Usage metering recordsusage_records, usage_summaries

Encryption: AES-256 at storage level (Azure platform-managed). Sensitive configuration fields additionally encrypted with Fernet (AES-128-CBC) at the application layer.

Backup: Azure Flexible Server automated backups with point-in-time recovery (PITR). See DR runbook.

Redis via Azure Cache for Redis.

UseKey pattern
Distributed session storesession:<session_id>
OAuth token cacheoauth_token:<token_hash>
IP allowlist cacheip_allowlist:<org_id> — 60s TTL
Policy chain cachepolicy_chain:<org_id> — 60s TTL
SIEM config cachesiem_config:<org_id> — 60s TTL
Rate limiting countersrate_limit:<org_id>:<path>

No sticky sessions required: The Redis session store enables distributed sessions — any pod can handle any request without session affinity. This enables horizontal scaling without session-partition constraints.

Encryption: TLS enforced for all Redis connections. No data persisted to unencrypted disk.

DatasetFormatUpdate cadence
GeoIP database (MaxMind)MMDB binaryScheduled refresh
Credential Intelligence bloom filterBinaryCDN-delivered refresh
Audit export archivesJSON (OCSF)On-demand / scheduled

Encryption: Azure Storage Service Encryption (AES-256, platform-managed keys).

Stores all secrets in production:

  • Fernet encryption key (application-layer encryption)
  • SAML signing certificates
  • Internal CA intermediate key (for outpost certificate issuance)
  • Service connection strings and credentials

Key Vault tier: FIPS 140-2 Level 3 HSM-backed. Applications fail to start if Key Vault is unreachable and insecure fallbacks are detected.


All services emit structured JSON logs with consistent fields:

FieldDescription
timestampISO 8601
leveldebug, info, warning, error, critical
serviceService name (platform-api, ner-gpu, etc.)
request_idUUID — spans all log entries for a single request
org_idTenant identifier
eventEvent name
duration_msProcessing time for completed operations

Logs ship to Azure Log Analytics and are available for SIEM correlation. Each request produces log entries across multiple services that correlate via request_id.

HTTP requests carry the traceparent header per W3C Trace Context specification. The header is propagated through the full request chain (API → NER GPU → DeBERTa → provider), enabling end-to-end request tracing in compatible observability platforms.

ServiceHealth endpointCircuit breaker
platform-apiGET /healthz
ner-gpuGET /health3 failures / 60s reset
deberta-validatorGET /health3 failures / 60s reset
Provider integrationsPer-provider healthProvider health score; automatic fallback

Circuit breakers on the NER and DeBERTa microservices prevent GPU inference failures from blocking the full request pipeline — the DLP pipeline degrades gracefully to lower tiers when GPU services are unavailable.


Arbitex enforces tenant isolation at the data layer. There is no shared database across tenants — all tables carry a tenant_id (UUID) column and all queries are filtered by tenant at the service layer. Tenant identity is derived from the authenticated API key or JWT claim; it cannot be overridden by the caller.

Tenant A Tenant B
│ │
▼ ▼
org_id=uuid-A org_id=uuid-B
│ │
▼ ▼
All DB queries: All DB queries:
WHERE tenant_id=A WHERE tenant_id=B
Credential typeScope
API keysSingle org; carry role (admin/user)
RS256 JWT (M2M)Single org; carry scopes (api:read, api:write, etc.)
SAML sessionSingle org; user identity from IdP
Outpost mTLS certificateSingle outpost; issued at registration, revokeable

No cross-tenant token escalation path exists.

Rate limiting is enforced at two layers:

LayerMechanismGranularity
Cloudflare edgePer-path rate limitsGlobal (pre-auth)
Platform APISliding window (Redis)Per org, per endpoint
Plan tierRPM limitBurst
Free60 RPMLimited
Standard600 RPMStandard burst
Premium3,000 RPMPremium burst
EnterpriseConfigurableCustom

M2M OAuth clients have an additional rate_limit_tier property (standard / premium / unlimited) independent of the org plan tier.

All service containers run as non-root users:

  • platform-apiappuser (UID 1000)
  • platform-frontendnginx user (UID 101)
  • GPU microservices — non-root GPU user

Pod Security Standards restricted profile is enforced on CPU namespaces. baseline profile on the GPU inference namespace (required for CUDA driver access).


The Arbitex Outpost extends the SaaS enforcement pipeline into your private network.

Your Network Arbitex SaaS
│ │
▼ │
┌──────────────────────────────────┐ │
│ Arbitex Outpost │ │
│ ┌────────────┐ ┌────────────┐ │ │
│ │ DLP │ │ Policy │ │◄─────┤ policy sync (mTLS)
│ │ Pipeline │ │ Cache │ │ │
│ └────────────┘ └────────────┘ │ │
│ ┌────────────┐ ┌────────────┐ │ │
│ │ DeBERTa │ │ CredInt │ │ │
│ │ (local) │ │ Bloom │ │ │
│ └────────────┘ └────────────┘ │ │
│ │ │
│ Audit buffer → SaaS sync (mTLS) │─────►│ audit events
└──────────────────────────────────┘
│ (to provider, or local-only)
AI Provider

The Outpost runs the same 3-tier DLP pipeline locally. Policy packs sync from the SaaS control plane over mTLS. In air-gap mode, the Outpost operates without SaaS connectivity, using the last synced policy bundle.

See Outpost deployment guide and Air-gap deployment guide.