Epic M — deployment architecture overview
Epic M Phase A (MA1–MA6) delivers the production container and Kubernetes infrastructure for the Arbitex Platform. This document describes the deployment architecture: how the services are containerized, how the Helm chart deploys them to AKS, how the CI/CD pipeline builds and pushes images, what security controls are in place, and where to find the operational runbooks.
This is an architecture overview. Step-by-step operational procedures are in the ops runbooks linked at the end of this document.
Phase A summary
Section titled “Phase A summary”| Sub-phase | Scope |
|---|---|
| MA1 | Multi-stage Dockerfiles for all four services (backend, frontend, NER GPU, DeBERTa validator) |
| MA2 | Helm chart (deploy/helm/arbitex-platform/) for AKS deployment — Deployments, Services, Ingress, PVC, HPA, PodDisruptionBudget |
| MA3 | GitHub Actions CI/CD pipeline: lint → test → Trivy vulnerability scan → build → ACR push |
| MA4 | Security hardening: non-root containers, read-only root filesystems, Azure Key Vault secrets, HSTS, mTLS CA verification |
| MA5 | NGINX Ingress configuration: request body limits, SSE long-poll support, TLS termination |
| MA6 | Ops runbooks: Alembic rollback, Docker→AKS migration, incident response playbook |
Container architecture
Section titled “Container architecture”Services
Section titled “Services”Four containers run as separate Kubernetes Deployments. All are built with multi-stage Dockerfiles pinned to digest-verified base images in production.
| Service | Image name | Internal port | Base image |
|---|---|---|---|
| FastAPI backend | platform-api | 8000 | python:3.12.8-slim |
| React/Nginx frontend | platform-frontend | 8080 | node:20.18-alpine → nginx:1.27.3-alpine |
| NER GPU microservice (GLiNER) | ner-gpu | 8200 | GPU-enabled PyTorch image |
| DeBERTa validator microservice | deberta-validator | 8201 | GPU-enabled PyTorch image |
Backend (FastAPI)
Section titled “Backend (FastAPI)”Two build stages:
- Builder — installs Python dependencies into
/installfromrequirements.txt(production deps only; dev tooling is excluded). - Production — copies the installed packages from the builder stage, creates a non-root user (
appuser, UID 1000), exposes port 8000, and runs thedocker-entrypoint.shwhich executes Alembic migrations then startsuvicorn.
Security properties:
PYTHONDONTWRITEBYTECODE=1andPYTHONUNBUFFERED=1for clean container logging.PYTHONPATH=/appto resolvebackend.app.*import paths.- Healthcheck uses Python stdlib (
urllib.request) — nocurlinstalled. - Entrypoint uses
execform so Linux signals propagate correctly touvicorn.
Frontend (React/Nginx)
Section titled “Frontend (React/Nginx)”Two build stages:
- Builder — Node 20 Alpine image runs
npm ci && npm run buildto produce the Vite/React dist bundle. - Production — Nginx 1.27.3 Alpine image serves the static bundle. The default nginx config is replaced with a custom
nginx.conf(request size limits, SSE, proxy headers). Runs asnginxuser (UID 101).
Healthcheck: wget -qO /dev/null http://localhost:8080/healthz.
NER GPU microservice
Section titled “NER GPU microservice”GLiNER zero-shot NER model (urchade/gliner_medium-v2.1) served over HTTP on port 8200. Runs on GPU node pool (accelerator=nvidia). Circuit breaker: 3 consecutive failures / 60-second reset window.
DeBERTa validator microservice
Section titled “DeBERTa validator microservice”DeBERTa NLI validator served over HTTP on port 8201. Runs on GPU node pool. Provides /validate endpoint for fine-grained entity type classification. Circuit breaker: 3 consecutive failures / 60-second reset window.
Note: the Outpost also includes an in-process DeBERTa scanner (see DeBERTa Tier 3 admin guide) that runs independently of this microservice.
Helm chart structure
Section titled “Helm chart structure”Chart location: deploy/helm/arbitex-platform/
deploy/helm/arbitex-platform/├── Chart.yaml # name: arbitex-platform, version: 0.1.0, appVersion: 0026├── values.yaml # default values├── values-prod.yaml # production resource overrides└── templates/ ├── deployment-api.yaml # FastAPI backend Deployment ├── deployment-frontend.yaml # React/Nginx frontend Deployment ├── deployment-ner-gpu.yaml # GLiNER NER Deployment ├── deployment-deberta.yaml # DeBERTa validator Deployment ├── service-api.yaml # ClusterIP :8100 ├── service-frontend.yaml # ClusterIP :3100 ├── service-ner-gpu.yaml # ClusterIP :8200 ├── service-deberta.yaml # ClusterIP :8201 ├── ingress.yaml # NGINX Ingress with TLS ├── hpa.yaml # HorizontalPodAutoscaler ├── pdb.yaml # PodDisruptionBudget ├── job-alembic-migrate.yaml # pre-upgrade Helm hook for migrations └── secret-provider-class.yaml # Azure Key Vault CSI SecretProviderClassValues hierarchy
Section titled “Values hierarchy”values.yaml ← base defaultsvalues-prod.yaml ← production resource overrides--set ... ← CI/CD runtime overrides (image tags, digests)Key service endpoints
Section titled “Key service endpoints”| Service | Cluster DNS | Port |
|---|---|---|
| Backend API | arbitex-platform-api | 8100 → container 8000 |
| Frontend | arbitex-platform-frontend | 3100 → container 8080 |
| NER GPU | arbitex-platform-ner-gpu | 8200 |
| DeBERTa | arbitex-platform-deberta | 8201 |
Alembic init container
Section titled “Alembic init container”When alembic.runAsInitContainer: true (the default), every API pod rollout includes an alembic-migrate init container that runs python -m alembic upgrade head before the main container starts. If the migration fails, the pod never reaches Ready state and Kubernetes blocks the rollout.
A pre-upgrade Helm hook Job also runs before each helm upgrade (backoffLimit: 3, ttlSecondsAfterFinished: 300).
GPU node placement
Section titled “GPU node placement”NER GPU and DeBERTa pods require GPU nodes. The chart sets nodeSelector: { accelerator: nvidia } and tolerations for nvidia.com/gpu: NoSchedule on those Deployments. The NVIDIA device plugin DaemonSet must be installed in the cluster (kube-system namespace) for GPU resource scheduling.
CI/CD pipeline
Section titled “CI/CD pipeline”The pipeline runs on GitHub Actions. The trigger is a push to main or a release tag. Stages run in order; the pipeline halts on any failure.
Lint (ruff, eslint) ↓Unit tests (pytest, vitest) ↓Trivy vulnerability scan (CRITICAL/HIGH findings fail the build) ↓Docker build (multi-stage, base image digest pinned) ↓ACR push (four images: platform-api, platform-frontend, ner-gpu, deberta-validator) ↓Helm upgrade (--atomic --wait, 10-minute timeout)Image tagging
Section titled “Image tagging”Each build produces two tags per image:
<acr>.azurecr.io/platform-api:<semver>— immutable release tag (e.g.v0.29.0)<acr>.azurecr.io/platform-api:latest— mutable floating tag
Production Helm deployments pin by digest (--set api.image.digest=sha256:<digest>) to prevent tag-overwrite incidents.
Trivy scan
Section titled “Trivy scan”Trivy scans all four images for OS package and Python dependency CVEs. CRITICAL and HIGH severity findings that do not have a --ignore-unfixed exemption fail the build. The scan report is uploaded as a GitHub Actions artifact.
NGINX Ingress configuration
Section titled “NGINX Ingress configuration”The Ingress controller is NGINX (ingress-nginx/ingress-nginx Helm chart). The Arbitex Ingress resource routes api.arbitex.ai to the backend and frontend services.
Request body limits
Section titled “Request body limits”The backend API accepts large file uploads and multi-turn conversation requests. The NGINX configuration sets:
client_max_body_size 50m;proxy_request_buffering off;This is set via the Ingress annotation nginx.ingress.kubernetes.io/proxy-body-size: 50m in templates/ingress.yaml.
SSE support
Section titled “SSE support”The chat completion endpoint streams responses as Server-Sent Events. NGINX is configured to disable proxy buffering for SSE routes:
proxy_buffering off;proxy_cache off;X-Accel-Buffering: noThis is applied via annotation on the Ingress resource.
TLS termination
Section titled “TLS termination”TLS terminates at the Ingress controller. Certificates are issued by cert-manager using a Let’s Encrypt ClusterIssuer (letsencrypt-prod). The Ingress references a TLS secret (arbitex-tls) that cert-manager populates.
Security layers
Section titled “Security layers”mTLS chain
Section titled “mTLS chain”Internal mTLS is used for Outpost-to-Platform communication. The Platform backend validates the Outpost’s client certificate against the Platform CA. Certificate paths are mounted as Kubernetes Secrets.
Secret management — Azure Key Vault
Section titled “Secret management — Azure Key Vault”Sensitive environment variables (DATABASE_URL, SECRET_KEY, AUDIT_HMAC_KEY, REDIS_URL, POLICY_SIGNING_KEY) are stored in Azure Key Vault and surfaced to pods via one of two methods:
| Method | When to use |
|---|---|
| Kubernetes Secret (manual) | Simpler setup; requires manual rotation |
Key Vault CSI driver (SecretProviderClass) | Recommended for production; automatic rotation via AKS managed identity |
The Helm template expects a Kubernetes Secret named <release-name>-api-secrets with at minimum DATABASE_URL and SECRET_KEY keys.
Container hardening
Section titled “Container hardening”All containers run with these security context settings:
securityContext: runAsNonRoot: true runAsUser: 1000 # appuser (backend) / 101 nginx user (frontend) readOnlyRootFilesystem: true allowPrivilegeEscalation: false capabilities: drop: [ALL]The Ingress sets the Strict-Transport-Security response header via annotation:
nginx.ingress.kubernetes.io/configuration-snippet: | add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;SCIM token rotation
Section titled “SCIM token rotation”SCIM provisioning tokens are stored in Key Vault and rotated by the identity team. The SCIM_BEARER_TOKEN secret in Key Vault is updated out-of-band; the CSI driver re-syncs the Kubernetes Secret on the next sync interval (default: 2 minutes).
Pod Security Standard
Section titled “Pod Security Standard”Non-GPU pods run under restricted Pod Security Standard. GPU pods (ner-gpu, deberta-validator) require baseline PSS due to device plugin requirements. If GPU and non-GPU pods share the arbitex namespace, set the namespace enforcement level to baseline with a warn=restricted label.
Ops runbooks reference
Section titled “Ops runbooks reference”The following runbooks are maintained in docs/ops/ of the platform repository:
| Runbook | Location | Use when |
|---|---|---|
| Alembic migration rollback | docs/ops/alembic-rollback.md | A migration breaks the API after deploy (startup errors, 500s, constraint violations) |
| Docker → AKS migration | docs/ops/migration-runbook.md | Migrating an existing Docker Compose deployment to AKS for the first time |
| Incident response | docs/ops/incident-response.md | Kill switch activation, provider failover, DLP bypass, HMAC integrity failure, platform outage |
Rollback decision matrix
Section titled “Rollback decision matrix”| Symptom | Action |
|---|---|
| Pods stuck in init container failure after deploy | Alembic rollback runbook (run alembic downgrade -1 as a one-off Job) |
| API CrashLoopBackOff, no migration issues | helm rollback arbitex-platform 0 --namespace arbitex |
| Provider returning 5xx errors | Incident response → Incident 2 (provider failover / circuit breaker) |
| DLP blocking legitimate content | Incident response → Incident 3 (DLP bypass procedure) |
| HMAC audit chain failure | Incident response → Incident 4 (audit chain integrity) |
| Complete platform outage | Incident response → Incident 5 (diagnostic steps + restart sequence) |
Current migration baseline
Section titled “Current migration baseline”The Alembic migration chain as of Epic M Phase A close:
| Revision | Description |
|---|---|
051_user_timezone | Current head — user timezone column |
050_org_scim_tokens | SCIM token storage for orgs |
049_audit_hmac_key_id | HMAC key ID for audit chain versioning |
The next migration will be 052_*. The full revision chain is documented in docs/ops/alembic-rollback.md, Section 8.